e RDF2vec.org


The hitchhiker's guide to RDF2vec.

About RDF2vec

RDF2vec is a tool for creating vector representations of RDF graphs. In essence, RDF2vec creates a numeric vector for each node in an RDF graph.

RDF2vec was developed by Petar Ristoski as a key contribution of his PhD thesis Exploiting Semantic Web Knowledge Graphs in Data Mining [Ristoski, 2019], which he defended in January 2018 at the Data and Web Science Group at the University of Mannheim, supervised by Heiko Paulheim. In 2019, he was awarded the SWSA Distinguished Dissertation Award for this outstanding contribution to the field.

RDF2vec was inspired by the word2vec approach [Mikolov et al., 2013] for representing words in a numeric vector space. word2vec takes as input a set of sentences, and trains a neural network using one of the two following variants: predict a word given its context words (continuous bag of words, or CBOW), or to predict the context words given a word (skip gram, or SG):

This approach can be applied to RDF graphs as well. In the original version presented at ISWC 2016 [Ristoski and Paulheim, 2016], random walks on the RDF graph are used to create sequences of RDF nodes, which are then used as input for the word2vec algorithm. It has been shown that such a representation can be utilized in many application scenarios, such as using knowledge graphs as background knowledge in data mining tasks, or for building content-based recommender systems [Ristoski et al., 2019].

Consider the following example graph:

From this graph, a set of random walks that could be extracted may look as follows:

Hamburg -> country -> Germany            -> leader     -> Angela_Merkel
Germany -> leader  -> Angela_Merkel      -> birthPlace -> Hamburg
Hamburg -> leader  -> Peter_Tschentscher -> residence  -> Hamburg

For those random walks, we consider each element (i.e., an entity or a predicate) as a word when running word2vec. As a result, we obtain vectors for eall entities (and all predicates) in the graph.

The resulting vectors have similar properties as word2vec embeddings. In particular, similar entities are closer in the vector space than dissimilar ones, which makes those representations ideal for learning patterns about those entities. In the example below, showing embeddings for DBpedia and Wikidata, countries and cities are grouped together, and European and Asian cities and countries form clusters:

The two figures above indicate that classes (in the example: countries and cities) can be separated well in the projected vector space, indicated by the dashed lines. [Zouaq and Martel, 2020] have compared the suitability for separating classes in a knowledge graph for different knowledge graph embedding methods. They have shown that RDF2vec is outperforming other embedding methods like TransE, TransH, TransD, ComplEx, and DistMult, in particular on smaller classes. On the task of entity classification, RDF2vec shows results which are competitive with more recent graph convolutional neural networks [Schlichtkrull et al., 2018].

RDF2vec has been tailored to RDF graphs by respecting the type of edges (i.e., the predicates). Related variants, like node2vec [Grover and Leskovec, 2016] or DeepWalk [Perozzi et al., 2014], are defined for graphs with just one type of edges. They create sequences of nodes, while RDF creates alternating sequences of entities and predicates.

This video by Petar Ristoski introduces the main ideas of RDF2vec:

Trans* etc. vs. RDF2vec, Similarity vs. Relatedness

A lot of approaches have been proposed for link prediction in knowledge graphs, from classic approaches like TransE [Bordes et al., 2013] and RESCAL [Nickel et al. ,2011] to countless variants. The key difference is that those approaches are trained to optimize a loss function on link prediction, which yields a projection of similar entities closely together in the vector space as a by product. On the other hand, the capability to predict links is a by product in RDF2vec, in particular in variants like order-aware RDF2vec. A detailed comparison of the commonalities and differences of those families of approaches can be found in [Portisch et al., 2022]


There are a few different implementations of RDF2vec out there:

  • The original implementation from the 2016 paper. Not well documented. Uses Java for walk generation, and Python/gensim for the embedding training.
  • jRDF2vec is a more versatile and better peforming Java-based implementation. Like the original one, it uses Java to generate the walks, and Python/gensim for training the embedding. There is also a Docker image available here. jRDF2vec is the best performing end-to-end implementation for RDF2vec. It also implements many variants, such as RDF2vec Light, as well as p-walks and e-walks (see below).
  • pyRDF2vec [Vandewiele et al., 2022] is a pure Python-based implementation. It implements multiple strategies to generate the walks, not only random walks, and also has an implementation of RDF2vec light (see below).
  • ataweel55's implementation is another pure Python-based implementation. It includes all strategies for biasing the walks described in [Cochez et al., 2017a] and [Al Taweel and Paulheim, 2020].
  • There is a high performance C++ based implementation for creating walks (also with different weighting mechanisms [Cochez et al., 2017]), which can be considered the fastest implementation for walk extraction from RDF files.
  • While all of those approaches use the word2vec implementation in gensim, there is also a PyTorch-based implementation, which also implements the word2vec part in pure Python.

Models and Services

Training RDF2vec from scratch can take quite a bit of time. Here is a list of pre-trained models we know:

There is also an alternative for downloading and processing an entire knowledge graph embedding (which may consume several GB):

  • KGvec2go provides a REST API for retrieving pre-computed embedding vectors for selected entities one by one, as well as further functions, such as computing the vector space similarity of two concepts, and retrieving the n closest concepts. There is also a service for RDF2vec Light (see below) [Portisch et al., 2020].

Extensions and Variants

There are quite a few variants of RDF2vec which have been examined in the past.

  • Walking RDF and OWL pursues exactly the same idea as RDF2vec, and the two can be considered identical. It uses random walks and Skip Gram embeddings. The approach has been developed at the same time as RDF2vec. [Alsharani et al., 2017]
  • KG2vec pursues a similar idea as RDF2vec by first transforming the directed, labeled RDF graph into an undirected, unlabeled graph (using nodes for the relations) and then extracting walks from that transformed graph. [Wang et al., 2021] Although no direct comparison is available, we assume that the embeddings are comparable.
  • Wembedder is a simplified version of RDF2vec which uses the raw triples of a knowledge graph as input to the word2vec implementation, instead of random walks. It serves pre-computed vectors for Wikidata. [Nielsen, 2017]
  • KG2vec (not to be confused with the aforementioned approach also named KG2vec) follows the same idea of using triples as input to a Skip-Gram algorithm. [Soru et al., 2018]
  • Triple2Vec follows a similar idea of walk-based embedding generation, but embeds entire triples instead of nodes. [Fionda and Pirrò, 2020]

Initially, word2vec was created for natural language, which shows a bit of variety with respect to word ordering. In contrast, walks extracted from graphs are different.

Consider, for example, the case of creating embedding vectors for bread in sentences such as Tom ate bread yesterday morning and Yesterday morning, Tom ate bread. For walks extracted from graphs, however, it makes a difference whether a predicate appears before or after the entity at hand. Consider the example above, where all three entities in the middle (Angela_Merkel, Peter_Tschentscher, and Germany) share the same context items (i.e., Hamburg and leader). However, for the semantics of an entity, it makes a difference whether that entity is or has a leader.

RDF2vec always generates embedding vectors for an entire knowledge graph. In many practical cases, however, we only need vectors for a small set of target entities. In such cases, generating vectors for an entire large graph like DBpedia would not be a practical solution.

  • RDF2vec Light is an alternative which can be used in such scenarios. It only creates random walks on a subset of the knowledge graph and can produce embedding vectors for a target subset of entities fast. In many cases, the results are competitive with those achieved with embeddings of the full graph. [Portisch et al., 2020] Details about the implementation are found here.
  • LODVec uses the same mechanism as RDF2vec Light, but creates sequences across different datasets by exploiting owl:sameAs links, and unifying classes and predicates by exploiting owl:equivalentClass and owl:equivalentProperty definitions. [Mountantonakis and Tzitzikas, 2021]

RDF2vec can explicitly trade off similarity and relatedness.

One of the key findings of the comparison of RDF2vec to embedding approaches for link prediction, such as TransE, is that while embedding approaches for link prediction create an embedding space in which the distance metric encodes similarity of entities, the distance metric in the RDF2vec embedding space mixes similarity and relatedness [Portisch et al., 2022]. This behavior can be influenced by changing the walk strategy, thereby creating embedding spaces which explicitly emphasize similarity or relatedness. The corresponding walk strategies are called p-walks and e-walks. In the above example, a set of p-walks would be:

birthPlace -> country -> Germany            -> leader     -> birthPlace
country    -> leader  -> Angela_Merkel      -> birthPlace -> leader
residence  -> leader  -> Peter_Tschentscher -> residence  -> leader

Likewise, the set of e-walks would be

Peter_Tschentscher -> Hamburg -> Germany            -> Angela_Merkel -> Hamburg
Hamburg            -> Germany -> Angela_Merkel      -> Hamburg       -> Peter_Tschentscher
Angela_Merkel      -> Hamburg -> Peter_Tschentscher -> Hamburg	     -> Germany

It has been shown that embedding vectors computed based on e-walks create a vector space encoding relatedness, while embedding vectors computed based on p-walks create a vector space encoding similarity. [Portisch and Paulheim, 2022]

Besides e-walks and p-walks, the creation of walks is the aspect of RDF2vec which has been undergone the most extensive research so far. While the original implementation uses random walks, alternatives have been explored include:

  • The use of different heuristics for biasing the walks, e.g., prefering edges with more/less frequent predicates, prefering links to nodes with higher/lower PageRank, etc. An extensive study is available in [Cochez et al., 2017a].
  • Zhang et al. also propose a different weighting scheme based on Metropolis-Hastings random walks, which reduces the probability of transitioning to a node with high degree and aims at a more balanced distribution of nodes in the walks. [Zhang et al., 2022]
  • A similar approach is analyzed in [Al Taweel and Paulheim, 2020], where embeddings for DBpedia are trained with external edge weights derived from page transition probabilities in Wikipedia.
  • In [Vandewiele et al., 2020], we have analyzed different alternatives to using random walks, such as walk strategies with teleportation within communities. While random walks are usually a good choice, there are scenarios in which other walking strategies are superior.
  • In [Saeed and Prasanna, 2018], the identification of specific properties for groups of entities is discussed as a means to find task-specific edge weights.
  • Similarly, NESP computes semantic similarities between relations in order to create semantically coherent walks. Moreover, the approach foresees refining an existing embedding space by bringing more closely related entities closer together. [Chekol and Pirrò, 2020]
  • Mukherjee et al. [Mukherjee et al., 2019] also observe that biasing the walks with prior knowledge on relevant properties and classes for a domain can improve the results obtained with RDF2vec.
  • The ontowalk2vec approach [Gkotse, 2020] combines the random walk strategies of RDF2vec and node2vec, and trains a language model on the union of both walk sets.

Besides changing the walk creation itself, there are also approaches for incorporating additional information in the walks:

  • [Bachhofner et al., 2021] discuss the inclusion of metadata, such as provenance information, in the walks in order to improve the resulting embeddings.

RDF2vec relies on the word2vec embedding mechanism. However, other word embedding approaches have also been discussed. For example, KGlove adapts the GloVe algorithm [Pennington et al., 2014] for creating the embedding vectors [Cochez et al., 2017b].

While the original RDF2vec approach is agnostic to the type of knowledge encoded in RDF, it is also possible to extend the approach to specific types of datasets.

To materialize or not to materialize? While it might look like a good idea to enrich the knowledge graph with implicit knowledge before training the embeddings, experimental results show that materializing implicit knowledge actually makes the resulting embedding worse, not better.

  • In [Iana and Paulheim, 2020], we have conducted a series of experiments training embeddings on DBpedia as is, vs. training embeddings on DBpedia with implicit knowledge materialized. In most settings, the results on downstream tasks get worse when adding implicit knowledge. Our hypothesis is that missing information in many knowledge graphs is not missing at random, but a signal of lesser importance, and that signal is canceled out by materialization. A similar observation was made by [Alsharani et al., 2017].

Other Resources

Other useful resources for working with RDF2vec:


RDF2vec has been used in a variety of applications. In the following, we list a number of those, organized by different fields of applications.

Knowledge Graph Refinement

Knowledge Graph Refinement subsumes the usage of embeddings for adding additional information to a knowledge graph (e.g., link/relation or type prediction), to extend its schema/ontology, or the identification (and potentially: correction) of existing facts in the graph [Paulheim, 2017]. In most of the applications, RDF2vec embedding vectors are used as representations for training a machine learning classifier for the task at hand, e.g., a predictive model for entity types. Applications in this area include:
  • TIEmb is an approach for learning subsumption relations using RDF2vec embeddings. [Ristoski et al., 2017]
  • Kejriwal and Szekely discuss the use RDF2vec embeddings for entity type prediction in knowledge graphs. [Kejriwal and Szekely, 2017] Another approach in that direction is proposed by Sofronova et al., who contrast supervised and unsupervised methods for exploiting RDF2vec embeddings for type prediction. [Sofronova et al., 2020] Furthermore, the usage of RDF2vec for type prediction in knowledge graphs is discussed in [Weller, 2021] and [Jain et al., 2021]. [Cutrona et al., 2021] report that using RDF2vec embeddings for type prediction yields similarly scoring results as using BERT embeddings trained on the entities' textual abstracts.
  • Daga and Groth also use RDF2vec to classify nodes in a knowledge graph extracted from Python notebooks on Kaggle. They show that the classification using RDF2vec significantly outperforms the usage of the pre-trained CodeBERTa model. [Daga and Groth, 2022]
  • GraphEmbeddings4DDI utilizes RDF2vec for predicting drug-drug interactions [Çelebi et al., 2018]. A similar system is introduced by Karim et al., using a complex LSTM on top of the entity embeddings generated with RDF2vec [Karim et al., 2019]. Since the drug-drug-interactions are modeled as relation in the knowledge graphs used for the experiments, this task is essentially a relation prediction task. [Zhang et al., 2022] also target the prediction of drug-drug-interaction and drug-target-interaction, using a combination of CNN and BiLSTM as a downstream prediction model.
  • Ammar and Celebi showcase the use of RDF2vec embeddings for the fact validation task at the 2019 edition of the Semantic Web Challenge. [Ammar and Celebi, 2019]. A similar approach is pursued by Pister and Atemezing [Pister and Atemezing, 2019].
  • Chen et al. show that RDF2vec embeddings can be used for relation prediction and yields results competitive with TransE and DistMult [Chen et al., 2020].
  • Yao and Barbosa combines RDF2vec and outlier detection for detecting wrong type assertions in knowledge graphs [Yao and Barbosa, 2021].
  • Egami et al. utilize RDF2vec for clustering activities in a knowledge graph of daily living activities, and discuss the use of those clusters for refining the underlying ontology. [Egami et al., 2021]
  • [Heilig et al., 2022] use RDF2vec embedding on a biomedical knowledge graph for refining rules for medical diagnosis. This gives an interesting example for combining embeddings with explainable artificial intelligence: the embeddings are not used directly for prediction, but rather to refine interpretable rules, which are reviewed by medical experts.

Knowledge Matching and Integration

In knowledge matching and integration, entity embedding vectors are mostly utilized to determine whether two entities in two datasets are similar enough to each other to merge them into one. Different approaches have been proposed using RDF2vec for matching and integratino both on the schema as well as on the instance level:
  • MERGILO is a tool for merging structured knowledge extracted from text. A refinement of MERGILO using RDF2vec embeddings on FrameNet is discussed in [Alam et al., 2017].
  • EARL is a named entity linking tool which uses pre-trained RDF2vec embeddings. [Dubey et al., 2018]
  • ALOD2vec Matcher is an ontology matching system which uses pre-trained embeddings on the WebIsALOD knowledge graph to determine the similarity of two concepts. [Portisch and Paulheim, 2018]. The approach has later been extended to DBpedia, WordNet, Wikidata, Wiktionary, and BabelNet in [Portisch et al., 2021]. A similar approach is pursued by the DESKMatcher system, which uses domain specific embeddings from the business domain, e.g., the FIBO ontology [Monych et al., 2020].
  • AnyGraphMatcher is another ontology matching system which leverages RDF2vec embeddings trained on the two input ontologies to match [Lütke, 2019].
  • Azmy et al. use RDF2vec for entity matching across knowledge graphs, and show a large-scale study for matching DBpedia and Wikidata [Azmy et al., 2019]. A similar approach is introduced by Aghaei and Fensel, who combine RDF2vec embeddings with clustering and BERT sentence embeddings to identify related entities in two knowledge graphs [Aghaei and Fensel, 2021].
  • DELV is an entity matching approach for matching multiple knowledge graphs built on top of RDF2vec. It first embeds a central knowledge graph using RDF2vec, and then performs an RDF2vec embedding of satellite knowledge graphs with a slightly modified word2vec loss function, taking the minimization of the distance of already matched anchors into account.[Ruppen, 2018].
  • In a showcase for the MELT ontology matching framework, Hertling et al. show that by learning a non-linear mapping between RDF2vec embeddings of different ontologies, ontology matching can be performed at least for structurally similar ontologies [Hertling et al., 2020]. [Portisch et al., 2022] show that this can also be achieved by rotation of embedding spaces. This is particularly remarkable since that metric measures similarity, not relatedness, which is actually needed for the task at hand.

Applications in NLP

In natural language processing, knowledge graph embeddings are particularly handy in setups that already exploit knowledge graphs, for example, for linking entities in text to a knowledge graph using named entitiy linking and named entity disambiguation. Applications of RDF2vec in the NLP field include:
  • TREC CAR is a benchmark for complex answer retrieval. The authors use pre-trained RDF2vec embeddings as one means to represent queries and answers, and for matching them onto each other. [Nanni et al., 2017a]
  • Inan and Dikenelli demonstrate the usage of RDF2vec embeddings in named entity disambiguation in the entity disambiguation frameworks DoSeR and AGDISTIS. [Inan and Dikenelli, 2017]
  • In a later work, Inan and Dikelli propose the use of RDF2vec embeddings together with a BiLSTM and a CRF layer for entity disambiguation. [Inan and Dikenelli, 2018]
  • Wang et al. have used RDF2vec embeddings for analyzing entity co-occurence in tweets [Wang et al., 2017].
  • Nanni et al. showcase the use of RDF2vec embeddings for entity aspect linking in [Nanni et al., 2018].
  • Nizzoli et al. use RDF2vec, among other features, to perform named entity linking of geographic entities, in particular for scoring candidates. [Nizzoli et al., 2020]
  • KGA-CGM is a system for describing images with captions. It uses RDF2vec embeddings for handling out-of-training entities [Mogadala et al., 2018].
  • Türker discusses the use of RDF2vec for text categorization by embedding both texts and categories [Türker, 2019].
  • Vakulenko demonstrates the use of RDF2vec in dialogue systems [Vakulenko, 2019].
  • G-Rex is a tool for relation extraction from text which leverages RDF2vec entity embeddings [Ristoski et al., 2020].
  • El Vaigh et al. show that using cosine similarity in the RDF2vec space creates a strong baseline for collective entity linking [El Vaigh al., 2020]. This is particularly remarkable since that metric measures similarity, not relatedness, which is actually needed for the task at hand.
  • Yamada et al. also use RDF2vec for measuring entity relatedness, and contrast the results of RDF2vec trained on DBpedia to their model Wikipedia2vec. The results are close, with Wikipedia2vec yielding slightly better results, but also based on a model which is significantly larger than RDF2vec. [Yamada et al., 2020]
  • FinMatcher is a tool for named entity classification in the financial domain, developed for the FinSim-2 shared task. It uses pre-trained RDF2vec embeddings on WebIsALOD [Portisch et al., 2021]
  • [Eingleitner et al., 2021] use RDF2vec embeddings to provide semantic tags for news articles.

Information Retrieval

In information retrieval, similarity and relatedness of entities can be utilized to retrieve and/or rank results for queries for a given entity. Examples for the use of RDF2vec in the information retrieval field include:

Predictive Modeling

Predictive modeling was the original use case for which RDF2vec was developed. Here, external variables (which might be continuous or categorical) are predicted for a set of entities. By linking these entities to a knowledge graph, entity embeddings have been shown to be suitable representations for the downstream predictive modeling tools. Examples in this field include:
  • Hascoet et al. show how to use RDF2vec for image classification, especially for classes of images for which no training data is available, i.e., zero-shot-learning. [Hascoet et al., 2017]
  • evoKGsim* combines similarity metrics and genetic programming for predicting out-of-KG relations. The framework implements RDF2vec as one source of similarity metrics. [Sousa et al., 2021]
  • Biswas et al. discuss the use of RDF2vec as a signal for predicting infobox types in Wikipedia articles [Biswas et al., 2018].
  • Egami et al. show the use case of geospatial data analytics in urban spaces by constructing a geospatial knowledge graph and computing RDF2vec embeddings thereon [Egami et al., 2018].
  • Hees discusses the use of pre-trained RDF2vec models for predicting human associations of terms [Hees, 2018].
  • The utilization of RDF2vec for content-based recommender systems is discussed in [Saeed and Prasanna, 2018], [Ristoski et al., 2019], and [Voit and Paulheim, 2021]. [Palumbo et al., 2019] report that RDF2vec performs better in terms of recommending novel items than other competitors.
  • Jurgovsky demonstrates the use of RDF2vec for data augmentation on the task of credit card fraud detection [Jurgovsky, 2019].
  • Hoppe et al. demonstrate the use of RDF2vec embeddings on DBpedia for improving the classification of scientific articles [Hoppe et al., 2021].
  • [Nunes et al., 2021] show how graph embeddings on biomedical ontologies can be utilized for predicting drug-gene-interactions. They train classifiers such as random forests over the concatenated embedding vectors of the drugs and genes.
  • [Sousa et al., 2021] use embeddings on the Gene ontology for various predictive modeling tasks in the biomedical domain, including the prediction of proteins and the interaction of diseases and genes.
  • Wang et al. use embeddings, including RDF2vec, to assess the similarity of proteins in the Gene Ontology [Wang et al., 2022].
  • Ramezani et al. represent essays by knowledge graphs, and use embeddings of the concepts in those graphs to predict the author's personality in the big 5 model based on their written essay. [Ramezani et al., 2022]
  • [Carvalho et al., 2022] use RDF2vec embeddings on an ontology-enriched variant of the MIMIC III dataset, a database of hospital patient data, to predict patient readmission to intensive care units.

Other Applications

No matter how sophisticated your categorization schema is, you always end up with a category called "other" or "misc.". Here are examples for applications of RDF2vec in that category:
  • REMES is an entity summarization approach which uses RDF2vec to select a suitable subset of statements for describing an entity. [Gunaratna et al., 2017] Another work proposing the usage of RDF2vec for entity summarization is discussed in [Li et al., 2020].
  • Similar to that, Shi et al. propose an approach for extracting semantically coherent subgraphs from a knowledge graph, which uses RDF2vec as a measure for semantic distance to guarantee semantic coherence. [Shi et al., 2021]
  • Jurisch and Igler demonstrate that utilization of RDF2vec embeddings for detecting changes in ontologies in [Jurisch and Igler, 2018].


These are the core publications of RDF2vec:

  1. Petar Ristoski, Heiko Paulheim: RDF2Vec: RDF Graph Embeddings for Data Mining. International Semantic Web Conference, 2016
  2. Petar Ristoski, Jessica Rosati, Tommaso Di Noia, Renato De Leone, Heiko Paulheim: RDF2Vec: RDF Graph Embeddings and Their Applications. Semantic Web Journal 10(4), 2019

Further references used above:

  1. Sareh Aghaei, Anna Fensel: Finding Similar Entities Across Knowledge Graphs. International Conference on Advances in Computer Science and Information Technology, 2021.
  2. Mehwish Alam, Diego Reforgiato Recupero, Misael Mongiovi, Aldo Gangemi, Petar Ristoski: Event-based knowledge reconciliation using frame embeddings and frame similarity. Knowledge-based Systems (135), 2017
  3. Mona Alshahrani, Mohammad Asif Khan, Omar Maddouri, Akira R Kinjo, Núria Queralt-Rosinach, Robert Hoehndorf: Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 33(17), 2017.
  4. Faisal Alshargi, Saeedeh Shekarpour, Tommaso Soru, Amit Sheth: Concept2vec: Metrics for Evaluating Quality of Embeddings for Ontological Concepts. Spring Symposium on Combining Machine Learning with Knowledge Engineering, 2019
  5. Ahmad Al Taweel, Heiko Paulheim: Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings. Deep Learning for Knowledge Graphs Workshop, 2020
  6. Ammar Ammar, Remzi Celebi: Fact Validation with Knowledge Graph Embeddings. International Semantic Web Conference, 2019
  7. Michael Azmy, Peng Shi, Jimmy Lin, Ihab F. Ilyas: Matching Entities Across Different Knowledge Graphs with Graph Embeddings. arxiv.org, 2019
  8. Stefan Bachhofner, Peb Ruswono Aryan, Bernhard Krabina, Robert David: Embedding Metadata-Enriched Graphs. International Semantic Web Conference, Posters and Demos, 2021.
  9. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, Oksana Yakhnenko: Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).
  10. Remzi Çelebi, Erkan Yaşar, Hüseyin Uyar, Özgür Gümüş, Oguz Dikenelli, Michel Dumontier: Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction using Linked Open Data. International Conference Semantic Web Applications and Tools for Life Sciences, 2018
  11. Russa Biswas, Rima Türker, Farshad Bakhshandegan-Moghaddam, Maria Koutraki, Harald Sack: Wikipedia Infobox Type Prediction Using Embeddings. Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies, 2018
  12. Ricardo Miguel Serafim Carvalho, Catia Pesquita, Daniela Oliveira: Knowledge Graph Embeddings for ICU readmission prediction. Research Square, 2022.
  13. Melisachew Wudage Chekol, Giuseppe Pirrò: Refining Node Embeddings via Semantic Proximity. In: International Semantic Web Conference, 2020.
  14. Jiaoyan Chen, Xi Chen, Ian Horrocks, Erik B. Myklebust, Ernesto Jiménez-Ruiz: Correction Knowledge Base Assertions. The Web Conference, 2020
  15. Jiaoyan Chen, Pan Hu, Ernesto Jimenez-Ruiz, Ole Magnus Holter, Denvar Antonyrajah, Ian Horrocks: OWL2Vec*: Embedding of OWL Ontologies. arxiv.org, 2020
  16. Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, Heiko Paulheim: Biased Graph Walks for RDF Graph Embeddings. International Conference on Web Intelligence, Mining, and Semantics, 2017
  17. Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, Heiko Paulheim: Global RDF Vector Space Embeddings. International Semantic Web Conference, 2017
  18. Vincenzo Cutrona, Gianluca Puleri, Federico Bianchi, Matteo Palmonari: NEST: Neural Soft Type Constraints to Improve Entity Linking in Tables. Semantics, 2021.
  19. Enrico Daga, Paul Groth: Data journeys: knowledge representation and extraction. Under review at Semantic Web Journal.
  20. Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, Jens Lehmann: EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs. International Semantic Web Conference, 2018
  21. Shusaku Egami, Takahiro Kawamura, Akihiko Ohsuga: Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods. Joint International Semantic Technology Conference, 2018
  22. Shusaku Egami, Satoshi Nishimura, Ken Fukuda: A Framework for Constructing and Augmenting Knowledge Graphs using Virtual Space: Towards Analysis of Daily Activities. In: International Conference on Tools for Artificial Intelligence, 2021
  23. Cheikh-Brahim El Vaigh, François Goasdoué, Guillaume Gravier, Pascale Sébillot: A Novel Path-based Entity Relatedness Measure for Efficient Collective Entity Linking. In: International Semantic Web Conference, 2020
  24. Nora Engleitner, Werner Kreiner, Nicole Schwarz, Theodorich Kopetzky, Lisa Ehrlinger: Knowledge Graph Embeddings for News Article Tag Recommendation. Semantics, 2021.
  25. Michael Färber: The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. International Semantic Web Conference, 2019
  26. Michael Färber, David Lamprecht: The Data Set Knowledge Graph: Creating a Linked Open Data Source for Data Sets. Quantitative Science Studies, 2021.
  27. Valeria Fionda, Guiseppe Pirrò: Triple2Vec: Learning Triple Embeddings from Knowledge Graphs. AAAI Conference on Artificial Intelligence, 2020.
  28. Blerina Gkotse: Ontology-based Generation of Personalised Data Management Systems: an Application to Experimental Particle Physics. PhD Thesis at MINES ParisTech, 2020.
  29. Aditya Grover and Jure Leskovec: node2vec: Scalable Feature Learning for Networks.ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.
  30. Kalpa Gunaratna, Amir Hossein Yazdavar, Krishnaprasad Thirunarayan, Amit Sheth, Gong Cheng: Relatedness-based Multi-Entity Summarization. International Joint Conference on Artificial Intelligence, 2017
  31. Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi: Semantic Web and Zero-Shot Learning of Large Scale Visual Classes. International Workshop on Symbolic-Neural Learning, 2017
  32. Jörn Hees: Simulating Human Associations with Linked Data. University of Kaiserslautern, 2018
  33. Niclas Heilig, Jan Kirchhoff, Florian Stumpe, Joan Plepi, Lucie Flek, Heiko Paulheim: Refining Diagnosis Paths for Medical Diagnosis based on an Augmented Knowledge Graph. Workshop on Semantic Web solutions for large-scale biomedical data analytics, 2022.
  34. Sven Hertling, Jan Portisch, Heiko Paulheim: Supervised Ontology and Instance Matching with MELT. Ontology Matching, 2020.
  35. Ole Magnus Holter, Erik B. Myklebust, Jiaoyan Chen, Ernesto Jimenez-Ruiz: Embedding OWL Ontologies with OWL2Vec. International Semantic Web Conference, 2019
  36. Fabian Hoppe, Danilo Dessì, Harald Sack: Deep Learning meets Knowledge Graphs for Scholarly Data Classification. Companion Proceedings of the Web Conference, 2021.
  37. Andreea Iana, Heiko Paulheim: More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings. Combining Symbolic and Sub-symbolic methods and their Applications (CSSA), 2020
  38. Emrah Inan, Oguz Dikenelli: Effect of Enriched Ontology Structures on RDF Embedding-Based Entity Linking. Metadata and Semantic Research, 2017
  39. Emrah Inan, Oguz Dikenelli: A Sequence Learning Method for Domain-Specific Entity Linking. Named Entities Workshop, 2018.
  40. Nitisha Jain, Jan-Christoph Kalo, Wolf-Tilo Balke, Ralf Krestel: Do Embeddings Actually Capture Knowledge Graph Semantics?. Extended Semantic Web Conference, 2021
  41. Johannes Jurgovsky: Context-Aware Credit Card Fraud Detection. University of Passau, 2019
  42. Matthias Jurisch, Bodo Igler: RDF2Vec-based Classification of Ontology Alignment Changes. Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies, 2018
  43. Md Rezaul Karim, Michael Cochez, Joao Bosco Jares, Mamtaz Uddin, Oya Beyan, Stefan Decker: Drug-drug interaction prediction based on knowledge graph embeddings and convolutional-LSTM network. ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 2019
  44. Mayank Kejriwal, Pedro Szekely: Supervised Typing of Big Graphs using Semantic Embeddings. International Workshop on Semantic Big Data, 2017
  45. Sang-Min Kim, So-yeon Jin, Woo-sin Lee: A study on the Extraction of Similar Information using Knowledge Base Embedding for Battlefield Awareness. Journal of The Korea Society of Computer and Information, 2021
  46. Junyou Li, Gong Cheng, Qingxia Liu, Wen Zhang, Evgeny Kharlamov, Kalpa Gunaratna, Huajun Chen: Neural Entity Summarization with Joint Encoding and Weak Supervision. International Joint Conference on Artificial Intelligence, 2020.
  47. Wang Ling, Chris Dyer, Alan W. Black, Isabel Trancoso: Two/Too Simple Adaptations of Word2Vec for Syntax Problems. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015.
  48. Julie Loesch, Louis Meeckers, Ilse van Lier, Alie de Boer, Michel Dumontier, Remzi Celebi: Automated Identification of Food Substitutions Using Knowledge Graph Embeddings. International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, 2022.
  49. Alexander Lütke: AnyGraphMatcher Submission to the OAEI Knowledge Graph Challenge 2019. International Workshop on Ontology Matching, 2019
  50. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations, 2013
  51. Sudip Mittal, Anupam Joshi, Tim Finin: Cyber-All-Intel: An AI for Security related Threat Intelligence. arxiv.org, 2019
  52. Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger: Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects. Extended Semantic Web Conference, 2018
  53. Michael Monych, Jan Portisch, Michael Hladik, Heiko Paulheim: DESKMatcher. Ontology Matching, 2020
  54. Michalis Mountantonakis, Yannis Tzitzikas: Applying Cross-Dataset Identity Reasoning for Producing URI Embeddings over Hundreds of RDF Datasets, Journal of Metadata, Semantics and Ontologies, 2021
  55. Sourav Mukherjee, Tim Oates, Ryan Wright: Graph Node Embeddings using Domain-Aware Biased Random Walks. arxiv.org, 2019
  56. Federico Nanni, Bhaskar Mitra, Matt Magnusson, Laura Dietz: Benchmark for Complex Answer Retrieval. ACM International Conference on the Theory of Information Retrieval, 2017
  57. Federico Nanni, Simone Paolo Ponzetto, Laura Dietz: Building Entity-Centric Event Collections. ACM/IEEE Joint Conference on Digital Libraries, 2017
  58. Federico Nanni, Simone Paolo Ponzetto, Laura Dietz: Entity-aspect linking: providing fine-grained semantics of entities in context. International Joint Conference on Digital Libraries, 2018
  59. Maximilian Nickel, Volker Tresp, Hans-Peter Kriegel: A Three-Way Model for Collective Learning on Multi-Relational Data. In: International Conference on Machine Learning, 2011.
  60. Finn Årup Nielsen: Wembedder: Wikidata entity embedding web service. arxiv.org, 2017
  61. Leonardo Nizzoli, Marco Avvenuti, Maurizio Tesconi, Stefano Cresci: Geo-Semantic-Parsing: AI-powered geoparsing by traversing semantic knowledge graphs. Decision Support Systems, Volume 136, September 2020.
  62. Susana Nunes, Rita T. Sousa, Catia Pesquita: Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies. arxiv. org, 2021
  63. Enrico Palumbo, Alberto Buzio, Andrea Gaiardo, Giuseppe Rizzo, Raphael Troncy, Elena Baralis: Tinderbook: Fall in Love with Culture. Extended Semantic Web Conference, 2019.
  64. Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8(3), 2017.
  65. Maria Angela Pellegrino, Michael Cochez, Martina Garofalo, Petar Ristoski: A Configurable Evaluation Framework for Node Embedding Techniques. Extended Semantic Web Conference, 2019
  66. Maria Angela Pellegrino, Abdulrahman Altabba, Martina Garofalo, Petar Ristoski, Michael Cochez: GEval: A Modular and Extensible Evaluation Framework for Graph Embedding Techniques. Extended Semantic Web Conference, 2020
  67. Jeffrey Pennington, Richard Socher, Christopher D. Manning: GloVe: Global Vectors for Word Representation. Empirical Methods in Natural Language Processing, 2014
  68. Bryan Perozzi, Bryan, Rami Al-Rfou, Steven Skiena: Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
  69. Alexis Pister, Ghislain Atemezing: Knowledge Graph Embedding for Triples Fact Validation. International Semantic Web Conference, 2019
  70. Jan Portisch and Heiko Paulheim: ALOD2vec Matcher. International Workshop on Ontology Matching, 2018
  71. Jan Portisch, Michael Hladik, Heiko Paulheim: KGvec2go - Knowledge Graph Embeddings as a Service. International Conference on Language Resources and Evaluation, 2020
  72. Jan Portisch, Michael Hladik, Heiko Paulheim: RDF2Vec Light – A Lightweight Approach for Knowledge Graph Embeddings. International Semantic Web Conference, Posters and Demos, 2020.
  73. Jan Portisch, Michael Hladik, Heiko Paulheim: FinMatcher at FinSim-2: Hypernym Detection in the Financial Services Domain using Knowledge Graphs. Workshop on Financial Technology on the Web (FinWeb), 2021.
  74. Jan Portisch, Michael Hladik, Heiko Paulheim: Background Knowledge in Schema Matching: Strategy vs. Data. International Semantic Web Conference, 2021.
  75. Jan Portisch, Heiko Paulheim: Putting RDF2vec in Order. International Semantic Web Conference, Posters and Demos, 2021.
  76. Jan Portisch, Nicolas Heist, Heiko Paulheim: Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the same Coin?. Semantic Web Journal 13(3), 2022.
  77. Jan Portisch, Heiko Paulheim: Walk this Way! Entity Walks and Property Walks for RDF2vec. ESWC Posters and Demos, 2022.
  78. Jan Portisch, Guilherme Costa, Karolin Stefani, Katharina Kreplin, Michael Hladik, Heiko Paulheim: Ontology Matching Through Absolute Orientation of Embedding Spaces. ESWC Posters and Demos, 2022.
  79. Majid Ramezani, Mohammad-Reza Feizi-Derakhshi, Mohammad-Ali Balafar: Knowledge Graph-Enabled Text-Based Automatic Personality Prediction. arxiv.org., 2022.
  80. Petar Ristoski, Stefano Faralli, Simone Paolo Ponzetto, Heiko Paulheim: Large-scale taxonomy induction using entity and word embeddings. International Conference on Web Intelligence, 2017
  81. Petar Ristoski: Exploiting Semantic Web Knowledge Graphs in Data Mining. IOS Press, Studies on the Semantic Web (38), 2019
  82. Petar Ristoski, Anna Lisa Gentile, Alfredo Alba, Daniel Gruhl, Steven Welch: Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop. Semantic Web Journal (60), 2020
  83. Leon Ruppen: Dependent Learning of Entity Vectors for Entity Alignment on Knowledge Graphs. Master's Thesis at ETH Zurich, 2018.
  84. Muhammad Rizwan Saeed, Viktor K. Prasanna: Extracting Entity-Specific Substructures for RDF Graph Embedding. IEEE International Conference on Information Reuse and Integration, 2018
  85. Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling: Modeling Relational Data with Graph Convolutional Networks. Extended Semantic Web Conference, 2018.
  86. Yuxuan Shi, Gong Cheng, Trung-Kien Tran, Evgeny Kharlamov, Yulin Shen: Efficient Computation of Semantically Cohesive Subgraphs for Keyword-Based Knowledge Graph Exploration. The Web Conference, 2021.
  87. Alexey Shigarov, Nikita Dorodnykh, Alexander Yurin, Andrey Mikhailov and Viacheslav Paramonov: From web-tables to a knowledge graph: prospects of an end-to-end solution. Scientific-practical Workshop Information Technologies: Algorithms, Models, Systems, 2021
  88. Fatima Zohra Smaili, Xin Gao, Robert Hoehndorf: Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations. Bioinformatics 34(13), 2018.
  89. Radina Sofronova, Russa Biswas, Mehwish Alam, Harald Sack: Entity Typing based on RDF2Vec using Supervised and Unsupervised Methods. Extended Semantic Web Conference, 2020.
  90. Tommaso Soru, Stefano Ruberto, Diego Moussallem, Edgard Marx, Diego Esteves, Axel-Cyrille Ngonga Ngomo: Expeditious Generation of Knowledge Graph Embeddings. European Conference on Data Analysis, 2018
  91. Rita T. Sousa, Sara Silva, Catia Pesquita: evoKGsim*: a framework for tailoring Knowledge Graph-based similarity forsupervised learning. OpenReview, 2021.
  92. Rita T. Sousa, Sara Silva, Catia Pesquita: Supervised Semantic Similarity. bioRxiv, 2021.
  93. Bram Steenwinckel, Gilles Vandewiele, Ilja Rausch, Pieter Heyvaert, Ruben Taelman, Pieter Colpaert, Pieter Simoens, Anastasia Dimou, Filip De Turck, Femke Ongenae: Facilitating the Analysis of COVID-19 Literature Through a Knowledge Graph. International Semantic Web Conference, 2020.
  94. Rima Türker: Knowledge-Based Dataless Text Categorization. Extended Semantic Web Conference, 2019
  95. Gilles Vandewiele, Bram Steenwinckel, Pieter Bonte, Michael Weyns, Heiko Paulheim, Petar Ristoski, Filip De Turck, Femke Ongenae: Walk Extraction Strategies for Node Embeddings with RDF2Vec in Knowledge Graphs, arxiv.org, 2020.
  96. Gilles Vandewiele, Bram Steenwinckel, Terencio Agozzino, Femke Ongenae: pyRDF2Vec: A Python Implementation and Extension of RDF2Vec. arxiv.org, 2022.
  97. Svitlana Vakulenko: Knowledge-based Conversational Search. TU Wien, 2019.
  98. Michael Matthias Voit, Heiko Paulheim: Bias in Knowledge Graphs - an Empirical Study with Movie Recommendation and Different Language Editions of DBpedia. Conference on Language, Data and Knowledge, 2021
  99. Yiwei Wang, Mark James Carman, Yuan Fang Li: Using knowledge graphs to explain entity co-occurrence in Twitter. ACM Conference on Knowledge and Information Management, 2017
  100. YueQun Wang, LiYan Dong, XiaoQuan Jiang, XinTao Ma, YongLi Li, Hao Zhang: KG2Vec: A node2vec-based vectorization model for knowledge graph. PLOS ONE, 2021
  101. Hongxiao Wang, Hao Zheng, Danny Z. Chen: TANGO: A GO-term Embedding Based Method for Protein Semantic Similarity Prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022
  102. Tobias Weller: Learning Latent Features using Stochastic Neural Networks on Graph Structured Data. KIT, 2021.
  103. Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto: Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020
  104. Peiran Yao and Denilson Barbosa: Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out. The Web Conference, 2021
  105. Shuo Zhang and Krisztian Balog: Ad Hoc Table Retrieval using Semantic Similarity. The Web Conference, 2018
  106. Shuo Zhang and Krisztian Balog: Semantic Table Retrieval using Keyword and Table Queries. arxiv.org, 2021
  107. Shuo Zhang, Xiaoli Lin, Xiaolong Zhang: Discovering DTI and DDI by Knowledge Graph with MHRW and Improved Neural Network. IEEE International Conference on Bioinformatics and Biomedicine, 2021
  108. Amal Zouaq and Felix Martel: What is the schema of your knowledge graph?: leveraging knowledge graph embeddings and clustering for expressive taxonomy learning. International Workshop on Semantic Big Data, 2020.


The original development of RDF2vec was funded in the project Mine@LOD by the Deutsche Forschungsgemeinschaft (DFG) under grant number PA 2373/1-1 from 2013 to 2018.


If you are aware of any implementations, extensions, pre-trained models, or applications of RDF2vec not listed on this Web page, please get in touch with Heiko Paulheim.