Home Insights Leveraging large language models with RWE to generate more comprehensive insights in Medical Affairs

Leveraging large language models with RWE to generate more comprehensive insights in Medical Affairs

The era of large language models (LLMs) has ushered in a wave of innovation in medicine, offering promise to revolutionize the way we approach drug discovery, development, and clinical care. After OpenAI’s ChatGPT was publicly released in November 2022, interest in LLMs has intensified, particularly in healthcare, where their potential applications are increasingly recognized. Indeed, LLMs today can analyze large amounts of medical data and are continuing to be optimized to support clinical decision-making through evidence-based recommendations.

The ability of LLMs to learn, adapt, and process complex concepts underlie their relevance and applicability to many areas of healthcare. Leveraging advanced natural language processing (NLP) capabilities, some LLMs, for example, are equipped to analyze various types of real-world data. Specifically, these models can recognize and process vast amounts of unstructured medical data in electronic health records (EHRs), clinical trial reports, and medical literature to identify patterns and trends that can enhance insight generation and clinical decision-making.

In this article, we provide an overview of some of the ways that LLMs are being leveraged in healthcare to analyze real-world data that may unlock valuable insights on improving patient outcomes. Additionally, while LLMs hold immense promise, their integration into healthcare also raises important quality and ethical issues that highlight the need for robust guidelines to ensure safe and effective use.

Predicting seizure recurrence – Leveraging clinical notes in EHRs

Electronic health records (EHRs) contain diverse real-world data including “structured” information such as diagnosis codes and laboratory results, as well as “unstructured” data such as clinical notes and imaging reports. Recently, researchers investigated the potential of LLMs to analyze EHR clinical data to accurately predict the risk of seizure recurrence in pediatric patients1. Comparing the performance of traditional machine learning-based predictive modeling using structured data to novel LLM approaches, scientists and clinicians quantified the risk of seizure recurrence within a two-year timeframe.

Results demonstrated that significant disease prognostic information is often captured in clinical notes of providers, enabling LLMs to quantify the seizure risk recurrence with a high degree of accuracy. Since LLMs can recognize and process both structured and unstructured data, these models demonstrated improved predictive performance compared to traditional machine learning methods that only use structured data.  To fine-tune the predictive accuracy of LLMs, researchers emphasized the importance of pre-training models on domain-specific data or content relevant to analysis. Importantly, the study highlighted the utility of incorporating clinical narratives alongside structured medical data to enhance predictive analytics in healthcare.

Utilizing natural language processing techniques and leveraging the wealth of real-world data contained within EHRs, LLMs have the potential to assist healthcare providers in making more informed clinical decisions on pediatric epilepsy management. While work remains to be done on model development and optimization before broader implementation, this study suggests that LLMs can predict general health outcomes and that real-world data can be leveraged to extract more comprehensive insights about patient health status and disease prognosis.

Social determinants of health – Beyond clinical data

Social Determinants of Health (SDoH) encompass the environmental and socio-economic factors that significantly influence health outcomes, shaped by the distribution of resources at global, national, and local levels2. They can either be detrimental or protective and can impact overall health and well-being, including access to medical care. Despite their critical role, SDoH are often poorly documented in EHRs and other medical records, posing challenges for research and clinical care3. Multiple groups have recently investigated the potential of natural language processing (NLP) algorithms and LLMs to identify and extract SDoH from unstructured patient data, in order to improve the documentation and understanding of social factors that impact health.

In 2022, researchers developed and validated an NLP algorithm to determine the prevalence of social factors that put acute care patients at risk of being incapacitated. The algorithm processed and analyzed EHR clinical notes on consultations, discharge summaries, nursing, nutrition, social work, and rehabilitation to identify patients that lacked social support and could potentially face decision capacity challenges. This study was among the first to demonstrate the potential for NLP approaches to collect and utilize data on social determinants of health that may potentially inform interventions to support the well-being of patients4. Similarly, in 2023, another NLP approach identified social risk factors in home health care clinical notes and examined the association of these factors to future hospitalizations or emergency department visits5.

As prior NLP approaches demonstrated feasibility of extracting SDoH from clinical texts, more recently, another group assessed the advanced capabilities of LLMs to optimize performance in SDoH data identification and extraction. Multiple models were tested to extract key SDoH factors that include employment status, housing issues, transportation issues, parental status, and social support. Study results were promising and highlighted the importance of fine-tuning – LLMs that were fine-tuned exhibited lower bias and outperformed other models in recognizing SDoH factors6. In a recent pilot project at Putnam, we’ve demonstrated that fine-tuning through synthetic data injection, in this case, demographic information, decreases bias in assessing SDoH while maintaining the accuracy prediction of the model.

Ultimately, recognizing and incorporating social determinants of health into practices enables healthcare professionals to provide more holistic and equitable care that meets the specific needs of patients. The ability of LLMs to enhance the identification and recognition of these factors may support overall patient health by determining which individuals may benefit from resource and social work support. In the future, LLMs may further deepen our understanding of health disparities by generating comprehensive insights from real-world evidence.

The future LLM landscape in healthcare

The impact of LLMs on healthcare is just beginning to unfold. As investment continues to explode in this space, significant advancements in medical research, patient care, and public health initiatives are promised. The capabilities of LLMs in processing real-world data to unlock novel insights further support these future advancements. While the impact of LLMs on healthcare holds great promise, integration is still at its infancy with several challenges to overcome. As we continue to navigate through this evolving landscape, there are a few considerations to keep in mind:

  1. Establish best practices to ensure safe and effective use of LLMs
    A multidisciplinary approach that involves ethical, regulatory, and clinical considerations is required to establish best practices that promote the safe and effective use of LLMs in healthcare. At minimum, robust regulatory frameworks for LLMs should be established to protect sensitive information and prevent misuse. Collaboration with scientists and healthcare providers is necessary to continuously assess the safety and clinical utility of  LLMs and understand their performance against established clinical benchmarks. The Coalition for Health AI7, a collaborative initiative that aims to advance the responsible and ethical use of AI in healthcare, is developing quality and safety guidelines surrounding proper use of real-world data to support the credible, transparent, and equitable deployment of LLMs.
  2. Expansion in areas with high-quality real-world data
    Integration of LLMs with high quality real-world evidence such as EHRs, disease registries, and clinical trial databases, gives the models access to comprehensive datasets for analysis and promote the generation of more in-depth insights on health and disease. Enforcing data quality assurance measures is also necessary to ensure the accuracy of real-world data used by LLMs. Furthermore, fine-tuning LLMs on domain-specific datasets may improve their performance in understanding the nuances of clinical data, scientific terminology, and other context-specific information.
  3. Adoption but not over-reliance on LLMs
    While LLMs offer immense potential for generating insights, maintaining a critical eye and recognizing the limitations and potential biases of models is necessary to ensure responsible use. Over reliance on LLMs, especially in complex or nuanced situations, may lead to serious oversights and downstream consequences. Integrating human expertise with LLMs can mitigate the risks of over reliance and ensure the accuracy and relevance of evidence-based recommendations that may ultimately improve patient care and clinical outcomes.


  1. Beaulieu-Jones et al., 2023, Lancet Digital Health
  2. WHO: https://www.who.int/teams/social-determinants-of-health
  3. Hood et al., 2015, American Journal of Preventive Medicine
  4. Song et al., 2022, PLOS One
  5. Hobensack et al., 2023, Journal of the American Medical Directors Association
  6. Guevara et al., 2024, Nature Digital Medicine
  7. Coalition for Health AI: https://www.coalitionforhealthai.org/