Relation Linking for Questions using Wikidata

Relation Linking for Questions using Wikidata

Completed Bachelor Thesis

Relation Linking constitutes a bottleneck in the KBQA pipeline. Therefore, improvements with regards to Relation Linking are crucial in order to improve KBQA as a whole. While the models described in section 2 have constantly advanced the performance of Relation Linking, they are still far from reaching the accuracy of state of the art Entity Linking models. Since many approaches have already been exhausted, it is hard to find new methods of making Relation Linking more accurate. However, we can borrow from recent advances in another field of question answering, Open Domain Question Answering (ODQA). In ODQA, natural language question are answered not by using information from a knowledge base, but from text passages. Kratzwald and Feuerriegel have developed ordinal regression[KF18], a regression model for ranking text passages for ODQA. In contrast to previous ODQA systems, ordinal regression retrieves a variable number of passages. This feature makes ordinal regression an interesting candidate for KBQA. We hope to achieve high accuracy by encoding information about relations and giving them as input to the ordinal regression model. Another method that has improved ODQA is using negative information about a relation. In Dense Passage Retrieval (DPR)[Kar+20], Karphukin et. al. add so-called negative passages to the training data, i. e., passages which do not contain the answer to the question. They were able to achieve accurate results by calculating the negative log likelihood of the positive passage and the negative passages. If we can find a way to add negative relations to our training data, we can use the same approach. Since we are not aware of any KBQA model which use negative relations to train the model, this method could be a novel way to improve the performance of our model. Often, existing Relation Linking models perform relatively well on simple questions which only require one relation to answer (e. g. , the relation highest point is sufficient to answer the question ”What is the highest mountain in Germany?”). However, they struggle with more complex questions which require more than one relation to be identified correctly (e. g., the relations capital and population both need to be identified to answer the question ”How many people live in the capital of Germany?”). Conveniently, Xiong et. al. developed MDR [Xio+20], which builds on DPR and is adjusted to answering such complex questions. Because we want to use DPR as well, we can use this approach to achieve higher accuracy for complex questions.

Supervisors

To the top of the page