eSPARQL : design and implementation of a query language for epistemic queries on knowledge graphs

This thesis presents eSPARQL, a query language for managing and analyzing epistemic RDF-star metadata, extending SPARQL-star with a FROM BELIEF clause for handling multiple, conflicting beliefs. While slightly slower than manually written SPARQL-star queries, eSPARQL is simpler and more intuitive, enabling efficient belief-based querying.

true" ? copyright : '' }

Completed Master's Thesis

In recent years, large-scale knowledge graphs have emerged, integrating data from various sources. Often, this data includes assertions about other assertions, establishing contexts in which these assertions hold. A recent enhancement to RDF, known as RDF-star, allows for statements about statements and is currently under consideration as a W3C standard. However, RDF-star lacks a defined semantics for such statements and lacks intrinsic mechanisms to operate on them. This thesis describes and implements a novel query language, termed eSPARQL, tailored for epistemic RDF-star metadata and grounded in four-valued logic. Our language builds on SPARQL-star, the query language for RDF-star, by incorporating an expanded FROM clause, called FROM BELIEF, designed to manage multiple, and occasionally conflicting, beliefs. eSPARQL’s capabilities are demonstrated through four example queries, showcasing its ability to (i) retrieve individual beliefs, (ii) aggregate beliefs, (iii) identify conflicts between individuals, and (iv) handle nested beliefs (beliefs about beliefs). The implementation of eSPARQL developed in this thesis is built on top of an existing SPARQL-star query engine. In this implementation, the execution process of a eSPARQL consists of two phases.

First, the expression in the FROM BELIEF clause, called belief query, is translated into a SPARQL-star CONSTRUCT query that generates an intermediary graph, containing the beliefs of the subjects described in the belief query. In the second phase, This intermediary graph is then processed with the graph pattern of the eSPARQL by translating it to a graph pattern that can be processed by a standard SPARQ-star engine. In this last phase, the implementation translates eSPARQL operations to SPARQL-star, and checks if the pattern contains nested eSPARQL queries to be processed recursively. We study two research questions: (RQ1) Does the eSPARQL implementation scale? and (RQ2) How the eSPARQL implementation execution times compare with the execution time of manually written SPARQL-star queries? To answer these research questions, use the four example eSPARQL queries that showcase the abilities of eSPARQL and create a synthetic dataset generator that generates graphs of multiple sizes. Additionally, for research question RQ2, we manually generate SPARQL-star queries that are equivalent to the example eSPARQL queries. Regarding research question RQ1, our results show that eSPARQL has an execution time that is proportional with the data size. Regarding research question RQ2, except for one question, the manually written SPARQL-star queries are clearly faster than our implementation.

Although the implementation showed to be slower than the manually generated SPARQL-star queries, the eSPARQL queries are shorter and easier to understand. This positive aspect of eSPARQL, can motivate further studies on how to optimize the eSPARQL implementation.

Supervisor

To the top of the page