Creating medical registry datasets from unstructured text
The project aims to automate the data population process for clinical registries through the use of large language models (LLMs).
Factsheet
- Schools involved School of Engineering and Computer Science
- Institute(s) Institute for Patient-centered Digital Health (PCDH)
- Research unit(s) PCDH / AI for Health
- Funding organisation Innosuisse
- Duration (planned) 17.06.2024 - 17.06.2025
- Head of project Prof. Dr. Kerstin Denecke
- Partner ID Suisse AG
- Keywords Artificial intelligence, large language model, information extraction
Situation
This project aims to automate the process of data collection for clinical registries through the use of large language models (LLM). There are currently 116 registries represented in the Swiss Forum of Clinical Registries (Forum medizinische Register Schweiz ), which is managed by the Swiss Medical Association (FMH). Registry data is essential for quality assurance (e.g. the implant registry), including the tracking of adverse events and outcomes and the identification of treatment gaps. These and similar use cases require complete, high-quality data that is available in registries. Traditional methods of extracting clinical data from routine data and hospital information systems involve the manual copying and pasting of data, a time-consuming and error-prone process that leads to inconsistent and incomplete data. Our approach aims to automate this process by developing advanced natural language processing (NLP) algorithms that are able to accurately analyse and extract relevant clinical information from unstructured text in medical records.
Course of action
We will use and optimise LLM-based methods to extract relevant clinical data from unstructured texts and populate registry forms with this data. Furthermore, we will examine the scalability of the developed system and deduce what possibilities there are for further development, enhancement and improvement in order to meet the changing needs of the healthcare system and take into account technological advances.
Result
The result of the project will be a validation of the feasibility and quality assessment of LLM-based methods for information extraction to populate clinical registries.