Generating Random Knowledge Graphs from Rules

The thesis aims to address the challenge of generating synthetic data for knowledge graphs by leveraging the subset of first-order logic that includes only Horn clauses, acknowledging that real-world data often deviates from strict rule adherence due to incompleteness or contradictions. Instead of enforcing perfect compliance, it aims to produce synthetic data that follows specified levels of body support and confidence, enhancing realism for method validation.

Running Bachelor's Thesis

Knowledge graphs play a crucial role in representing knowledge within organizations and communities, in several application scenarios, and in integrating data at a large scale [1, 2]. Developing methods to learn models from knowledge graphs requires example knowledge graphs to validate the methods. Using real-world knowledge graphs on this validation provides a more realistic insight into the performance of these methods. However, using real-world knowledge graphs can be difficult. Lim et al. [3] state that real-world datasets often have restricted access because they contain private information. Moreover, the available real-world datasets may not include interesting properties for evaluating a method, or their complexity can complicate understanding the method’s performance. Therefore, there are use cases for which it is worth generating synthetic data.

Real knowledge graphs follow clauses like “people having the same parents are siblings” or “a professor is a person.” To this end, we consider the subset of first-order logic that includes only Horn clauses. The satisfiability problem, Horn-satisfiability, is decidable and tractable (P-complete) [4]. Also, Horn clauses are the base of logical languages such as Datalog [5] and have been used to describe the expressive power of knowledge graph embedding models in the knowledge graph embedding literature [6].

A Horn clause has form ∀𝑥∀𝑦(parent(𝑥, 𝑦) → child(𝑦, 𝑥)), which in the context of a graph means that if there is an edge from node Alice to node Bob labeled with the relation name “parent,” then there must be an inverse edge labeled with the relation name “child.” These clauses can be implicit in real knowledge graphs, and extracting them is a problem for which several approaches and tools have been implemented [7].

In real knowledge graphs, rules are not always satisfied because real data is usually incomplete and sometimes may even be contradictory. Rule miners annotate the extracted rules with statistics as the body support, the number of cases where the premise of the rule is satisfied, and the confidence, the proportion of these cases where the conclusion is also satisfied.

Thus, this thesis aims not to produce a knowledge graph that fulfills all given rules but, at a number of times, follows a given body of support and confidence.

References

  • [1] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. D. Melo, C. Gutierrez, S. Kirrane, J. E. L. Gayo, R. Navigli, S. Neumaier, et al. “Knowledge graphs.” In: ACM Computing Surveys (Csur) 54.4 (2021), pp. 1–37.
  • [2] T. Thanapalasingam, E. van Krieken, P. Bloem, P. Groth. “IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation”. In: arXiv preprint arXiv:2307.06698 (2023).
  • [3] S.- H. Lim, S. M. Lee, S. Powers, M. Shankar, N. Imam. “Survey of approaches to generate realistic synthetic graphs.” In: Oak Ridge National Laboratory (2015).
  • [4] S. Cook, P. Nguyen. Logical Foundations of Proof Complexity. Cambridge, United Kingdom: Cambridge University Press, Apr. 2010. isbn: 9780511686146.
  • [5] S. Ceri, G. Gottlob, L. Tanca. “What you Always Wanted to Know About Datalog (And Never Dared to Ask)”. In: IEEE Trans. Knowl. Data Eng. 1.1 (1989), pp. 146–166. doi: 10.1109/69.43410.
  • [6] C. Gregucci, M. Nayyeri, D. Hernández, S. Staab. “Link Prediction with Attention Applied on Multiple Knowledge Graph Embedding Models”. In: Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023. ACM, 2023, pp. 2600–2610. doi: 10 . 1145 / 3543507 . 3583358.
  • [7] L. A. Galárraga, C. Teflioudi, K. Hose, F. Suchanek. “AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases.” In: Proceedings of the 22nd International Conference on World Wide Web. WWW ’13. Rio de Janeiro, Brazil: Association for Computing Machinery, 2013, pp. 413–422. isbn: 9781450320351. doi: 10.1145/2488388.2488425.

Contact

To the top of the page