The vast majority of biomedical datasets provide annotations for English texts. For other languages, like German, datasets annotated with biomedical concepts are extremely scarce due to the effort- and knowledge-intensive manual creation process, which requires trained professionals to perform the annotation process.
The paper addresses this issue by introducing WikiMed-DE, a silver-standard dataset for biomedical entity linking for the German language. The automatic annotation process makes use of the links connecting the text of the German Wikipedia pages with the structured information available in the Wikidata knowledge base and in three knowledge sources from the biomedical domain: the Unified Medical Language System (UMLS), the Medical Subject Headings (MeSH) hierarchy and the Disease Ontology (DO).
The research reported in this paper has contributed to the AI4MedCode project, whose primary goal is to build an AI platform for predicting and processing medical coding information for optimization and quality monitoring of clinical treatment and invoicing.