Mixed Resolution Schemes for Efficient and Effective Knowledge Graph Embeddings

This thesis investigates biased sampling regimes based on entity and relation label frequencies to enhance efficiency and effectiveness in knowledge graph embedding training, comparing against random sampling using baseline models and evaluation metrics.

Running Master Thesis

Description

Knowledge graph (KG) is a structured semantic knowledge base that describes real-life entities and their relationships to each other. The basic unit of a KG is the (head entity, relation label, tail entity) triple. Knowledge graph embedding aims to embed the entities and relation labels into a continuous, low-dimensional vector space. This embedding preserves the inherent relational structure of the KG. Such an approach can benefit various downstream tasks, such as link prediction. In the training regime of the knowledge graph embedding, the method of random sampling is usually adopted, but it might not be optimal. To enhance the efficiency and effectiveness of knowledge graph embedding training, this thesis aims to investigate several biased sampling training regimes. Entities and relation labels are stratified according to the frequency of their occurrence in the KG, with the high-frequency group having a higher probability of being sampled. Alternatively, in the segmented training process, sampling begins with the high-frequency group and then progresses to the low-frequency group. TransE, RotatE, and DistMult are used as the baseline models, with mean reciprocal rank (MRR) and Hits@N serving as the evaluation criteria. The training results of stratified sampling are compared with random sampling to analyze whether these new training regimes could enhance the training efficiency and effectiveness of the knowledge graph embedding method.

Supervisors

To the top of the page