Embedding Knowledge Graphs with RDF2vec

The Book

The book Embedding Knowledge Graphs with RDF2vec, co-authored by Heiko Paulheim, Jan Portisch, and Petar Ristoski, is published by Springer Nature.

The book explains the ideas behind one of the most well-known methods for knowledge graph embedding of transformations to compute vector representations from a graph, known as RDF2vec. The authors describe its usage in practice, from reusing pre-trained knowledge graph embeddings to training tailored vectors for a knowledge graph at hand. They also demonstrate different extensions of RDF2vec and how they affect not only the downstream performance, but also the expressivity of the resulting vector representation, and analyze the resulting vector spaces and the semantic properties they encode.

Datasets

Different datasets are used for evaluation purposes in the book.

The running example uses a subgraph of DBpedia containing a set of artists including their related entities. The dataset consists of three files:

The RDF graph
The same graph in PyKEEN format/
The RDF graph with inferences materialized, as used in chapter 4.4
The ground truth labels used as classification target

Moreover, in different experiments, other existing evaluation benchmarks are used, in particular

The SWML benchmark for machine learning on RDF datasets,
The GEval benchmark for comparing knowledge graph embeddings on different ground truths, and
The DLCC benchmark for analyzing the expressivity of knowledge graph embeddings

The Book

Datasets

Notebooks