Position within the page tree

Home
Departments
Analytic Computing
Teaching
Student Theses
Byte pair encodings for knowledge graph embedings

Byte pair encodings for knowledge graph embeddings

This thesis aims to use byte pair encodings in knowledge graph embeddings which are anticipated to significantly enhance quality due to reduced dimensionality, leveraging modern large language model tokenization.

Description

Modern large language models use 30,000 tokens to learn large models. Current knowledge graph embeddings use millions of tokens to learn large models. We expect that clever use of tokenization found in large language models will improve knowledge graph embeddings, because of the reduced dimensionality of the problem.

In this thesis, you will build on a simple knowledge graph embedding method, RDF2Vec (Paulheim et al. 2023), which samples sequences from knowledge graphs for learning embeddings. You will modify this method to use byte pair encodings and evaluate the old and the new method with regard to their capabilities for node clustering and link prediction.

We expect that the use of byte pair encodings can tremendously improve the quality of knowledge graph embeddings.

References

H. Paulheim, P. Ristoski, J. Portisch. Embedding knowledge graphs with RDF2vec. Springer, 2023. https://link.springer.com/book/10.1007/978-3-031-30387-6

Contact

Bo Xiong

Researcher

Phone: +49 711 685 88110

E-Mail

Steffen Staab

Managing Director

Phone: +49 711 685 88100

E-Mail

Byte pair encodings for knowledge graph embeddings

Description

References

Contact

Bo Xiong

Steffen Staab

Audience

Formalities

Services

Organization

Byte pair encodings for knowledge graph embeddings

Description

References

Contact

Bo Xiong

Steffen Staab

Here you can reach us

Audience

Formalities

Services

Organization