NVIDIA Research at C-MIMI: Understanding Speech to Automate Charting for Telemedicine and Beyond

COVID-19 is fundamentally changing the doctor-patient dynamic worldwide. Telemedicine is now becoming an essential technology that healthcare providers can offer patients as an adjunct or alternative for in-person visits that is both effective and convenient.

We’ve all been there, at one time or another in the last six months: speaking with a nurse or family doctor using video calling on our mobile devices. The conversation flows naturally, and, by the end of the call, the doctor has taken their notes, collected the symptoms, made the diagnosis, and recommended a treatment. 

Imagine now what happens to that information – how was it captured? How is this information then saved back into clinical systems in a structured manner? What if the patient, who may be distressed, in pain, or suffering, forgets key parts of the conversation or needs it explained again? 

Having the ability to automatically capture the essence of these conversations in the medical record is one of the key use cases that inspired advancements made by NVIDIA’s NLP research. As the patient talks about their symptoms, automated speech recognition and natural language processing technology can capture, recognize, understand and summarize key concepts and structures using state of the art conversational AI models along with clinical ontologies like SNOMED-CT or ICD-10. Examples of such concepts include disease conditions, symptoms and treatments. The captured audio can come in the form of a telemedicine visit, or even captured during in-person physician visits.

A new NVIDIA research paper being featured at the Conference for Machine Intelligence in Medical Imaging (C-MIMI) 2020 entitled Fast and Accurate Clinical Named Entity Recognition and Concept Mapping from Conversations offers insight into how a medically savvy language model can potentially save time and accelerate the pace at which your doctor and the insurance carrier can provide consultation notes and process billing for the patient visit.

NVIDIA researchers deploy a state-of-the-art pretrained architecture to help clinicians augment patient experience with key extracts of symptoms, diagnosis, and recommended therapy. This paves the way to enable recapping conversations for both patients and their caregivers, as well as for the patient care team, as an important benefit so that they can be informed and have the reassurance of being able to review what was discussed.

Understanding Speech to Automate Charting for Telemedicine

Recent advances in natural language processing have led to large neural architectures, such as BERT, which rely on unsupervised pre-training to develop sophisticated language models. These language models perform very well across a wide range of tasks, however, they require extremely large volumes of training data and significant computing resources. Publicly available pre-trained checkpoints have simplified the model development process for researchers and software vendors, but checkpoints tuned for medical domains are not common.

In this research, the authors show that pre-training BERT-based architectures on in-domain biomedical data, as well as using a biomedical vocabulary, offers significant gains over existing approaches for tagging clinical data. As noted by the authors, a large BERT-based architecture pre-trained on the 4.5 billion-word PubMed  biomedical abstract corpus, performs “better than a general-domain BERT model pre-trained on Wikipedia and BooksCorpus.” By training a larger BERT model with 345 million parameters, named Bio-Megatron, on 6.1 billion words of PubMed text, the authors show that “using both a larger model and a biomedical vocabulary adds a further boost in performance.”

These models can easily be integrated with optimized speech and language pipelines to perform tasks such as transcribing speech to text and extracting the clinical entities like disease and medication names from the transcribed text. These optimized pipelines can be deployed on the cloud, data center, and edge to run in real-time, thereby improving physician productivity and patient experience in applications like virtual doctor consultation.

Training and inference were performed on NVIDIA V100 and T4 GPUs running NVIDIA Jarvis. Inference on a CPU can take up to three minutes, and on the GPU it is reduced to one second.

BN43\NVIDIA Jarvis is an application framework for building multimodal conversational AI services that deliver real-time performance on GPUs. Jarvis has several components such as SOT A pre-trained models in NGC, NVIDIA NeMo, and Transfer Learning Toolkit to fine-tune the models to understand your domain, optimized speech, language, and vision pipelines. You can use these pipelines independently or tie them together to perform a complicated task.

Read the full paper here. See all papers being presented at C-MIMI 

Also, check out the DLI courses for Healthcare. Take the DLI course for a deep dive on how to build state-of-the-art NLP models with NVIDIA NeMo.

NVIDIA has 4 papers accepted for Oral Presentation at C-MIMI this year, including:

  • Empirical Evaluation of Federated Learning for Classification of Chest X-Rays 
  • Creating a Converged Cohort Management and Deep Learning Ecosystem 
  • Fast and Accurate Clinical Named Entity Recognition and Concept Mapping from Conversations 
  • LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation