Predicting patient status parameters from medical reports

This thesis primarily focuses on biomedial data mining and predicting patient status parameters from medical reports.

Running Master Thesis

Thesis topic in the scope of the ServiceMeister project (

During a hospital visit a multitude of information is gathered about a patient, e.g. doctor's letters, lab results, pathology reports, etc. However, important information about the status of the patient often remains hidden in this data, making it very time-consuming to assess the status of the patient (especially in cases where further medical investigations are necessary).

The goal of this master thesis is to build a regression-based model for the prediction of status parameters using data like doctor's letters and pathology reports as input. 
Examples of such status parameters are the ECOG (Eastern Cooperative Oncology Group) performance status, which quantifies the general well-being of a cancer patient on a scale from 0-5 [1] and the TNM staging system which describes the stage of a cancer via various codes [2]. 

The system will build upon state-of-the-art contextualized word representation techniques like BERT [3], BioBERT [4] and GermanBERT [5], but will require their adaptation to the task at hand.

This thesis is part of a collaboration with the Klinikum Stuttgart [6] and involves work with private medical data, which must be performed in the Klinikum.

[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, NAACL-HLT 2019,
[4] BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang, Bioinformatics 2019,
[5] German’s Next Language Model: Branden Chan, Stefan Schweter, Timo Moller, COLING 2020,


Dr. Corina Dima
PD Dr. Roman Klinger
Prof. Dr. Steffen Staab



To the top of the page