Creating medical registry datasets from unstructured text

The project aims to automate the data population process for clinical registries through the use of large language models (LLMs).

Factsheet

Schools involved School of Engineering and Computer Science
Institute(s) Institute for Patient-centered Digital Health (PCDH)
Research unit(s) PCDH / AI for Health
Funding organisation Innosuisse
Duration (planned) 17.06.2024 - 17.06.2025
Head of project Prof. Dr. Kerstin Denecke
Partner ID Suisse AG
Keywords Artificial intelligence, large language model, information extraction

Situation

This project aims to automate the process of data collection for clinical registries through the use of large language models (LLM). There are currently 116 registries represented in the Swiss Forum of Clinical Registries (Forum medizinische Register Schweiz ), which is managed by the Swiss Medical Association (FMH). Registry data is essential for quality assurance (e.g. the implant registry), including the tracking of adverse events and outcomes and the identification of treatment gaps. These and similar use cases require complete, high-quality data that is available in registries. Traditional methods of extracting clinical data from routine data and hospital information systems involve the manual copying and pasting of data, a time-consuming and error-prone process that leads to inconsistent and incomplete data. Our approach aims to automate this process by developing advanced natural language processing (NLP) algorithms that are able to accurately analyse and extract relevant clinical information from unstructured text in medical records.

Course of action

We will use and optimise LLM-based methods to extract relevant clinical data from unstructured texts and populate registry forms with this data. Furthermore, we will examine the scalability of the developed system and deduce what possibilities there are for further development, enhancement and improvement in order to meet the changing needs of the healthcare system and take into account technological advances.

Result

The result of the project will be a validation of the feasibility and quality assessment of LLM-based methods for information extraction to populate clinical registries.