e RDF2vec.org

RDF2vec.org

The hitchhiker's guide to RDF2vec.

About RDF2vec

RDF2vec is a tool for creating vector representations of RDF graphs. In essence, RDF2vec creates a numeric vector for each node in an RDF graph.

RDF2vec was developed by Petar Ristoski as a key contribution of his PhD thesis Exploiting Semantic Web Knowledge Graphs in Data Mining [Ristoski, 2019], which he defended in January 2018 at the Data and Web Science Group at the University of Mannheim, supervised by Heiko Paulheim. In 2019, he was awarded the SWSA Distinguished Dissertation Award for this outstanding contribution to the field.

RDF2vec was inspired by the word2vec approach [Mikolov et al., 2013] for representing words in a numeric vector space. word2vec takes as input a set of sentences, and trains a neural network using one of the two following variants: predict a word given its context words (continuous bag of words, or CBOW), or to predict the context words given a word (skip gram, or SG):

This approach can be applied to RDF graphs as well. In the original version presented at ISWC 2016 [Ristoski and Paulheim, 2016], random walks on the RDF graph are used to create sequences of RDF nodes, which are then used as input for the word2vec algorithm. It has been shown that such a representation can be utilized in many application scenarios, such as using knowledge graphs as background knowledge in data mining tasks, or for building content-based recommender systems [Ristoski et al., 2019].

Consider the following example graph:

From this graph, a set of random walks that could be extracted may look as follows:

Hamburg -> country -> Germany            -> leader     -> Angela_Merkel
Germany -> leader  -> Angela_Merkel      -> birthPlace -> Hamburg
Hamburg -> leader  -> Peter_Tschentscher -> residence  -> Hamburg

For those random walks, we consider each element (i.e., an entity or a predicate) as a word when running word2vec. As a result, we obtain vectors for eall entities (and all predicates) in the graph.

The resulting vectors have similar properties as word2vec embeddings. In particular, similar entities are closer in the vector space than dissimilar ones (see [Hubert et al., 2023]), which makes those representations ideal for learning patterns about those entities. In the example below, showing embeddings for DBpedia and Wikidata, countries and cities are grouped together, and European and Asian cities and countries form clusters:

The two figures above indicate that classes (in the example: countries and cities) can be separated well in the projected vector space, indicated by the dashed lines. [Zouaq and Martel, 2020] have compared the suitability for separating classes in a knowledge graph for different knowledge graph embedding methods. They have shown that RDF2vec is outperforming other embedding methods like TransE, TransH, TransD, ComplEx, and DistMult, in particular on smaller classes. On the task of entity classification, RDF2vec shows results which are competitive with more recent graph convolutional neural networks [Schlichtkrull et al., 2018].

RDF2vec has been tailored to RDF graphs by respecting the type of edges (i.e., the predicates). Related variants, like node2vec [Grover and Leskovec, 2016] or DeepWalk [Perozzi et al., 2014], are defined for graphs with just one type of edges. They create sequences of nodes, while RDF creates alternating sequences of entities and predicates.

This video by Petar Ristoski introduces the main ideas of RDF2vec:

Trans* etc. vs. RDF2vec, Similarity vs. Relatedness

A lot of approaches have been proposed for link prediction in knowledge graphs, from classic approaches like TransE [Bordes et al., 2013] and RESCAL [Nickel et al. ,2011] to countless variants. The key difference is that those approaches are trained to optimize a loss function on link prediction, which yields a projection of similar entities closely together in the vector space as a by product. On the other hand, the capability to predict links is a by product in RDF2vec, in particular in variants like order-aware RDF2vec. A detailed comparison of the commonalities and differences of those families of approaches can be found in [Portisch et al., 2022]

Implementations

There are a few different implementations of RDF2vec out there:

  • The original implementation from the 2016 paper. Not well documented. Uses Java for walk generation, and Python/gensim for the embedding training.
  • jRDF2vec is a more versatile and better peforming Java-based implementation. Like the original one, it uses Java to generate the walks, and Python/gensim for training the embedding. There is also a Docker image available here. jRDF2vec is the best performing end-to-end implementation for RDF2vec. It also implements many variants, such as RDF2vec Light, as well as p-walks and e-walks (see below).
  • pyRDF2vec [Vandewiele et al., 2022] is a pure Python-based implementation. It implements multiple strategies to generate the walks, not only random walks, and also has an implementation of RDF2vec light (see below).
  • ataweel55's implementation is another pure Python-based implementation. It includes all strategies for biasing the walks described in [Cochez et al., 2017a] and [Al Taweel and Paulheim, 2020].
  • There is a high performance C++ based implementation for creating walks (also with different weighting mechanisms [Cochez et al., 2017]), which can be considered the fastest implementation for walk extraction from RDF files.
  • While all of those approaches use the word2vec implementation in gensim, there is also a PyTorch-based implementation, which also implements the word2vec part in pure Python.

Models and Services

Training RDF2vec from scratch can take quite a bit of time. Here is a list of pre-trained models we know:

There is also an alternative for downloading and processing an entire knowledge graph embedding (which may consume several GB):

  • KGvec2go provides a REST API for retrieving pre-computed embedding vectors for selected entities one by one, as well as further functions, such as computing the vector space similarity of two concepts, and retrieving the n closest concepts. There is also a service for RDF2vec Light (see below) [Portisch et al., 2020].

Extensions and Variants

There are quite a few variants of RDF2vec which have been examined in the past.

  • Walking RDF and OWL pursues exactly the same idea as RDF2vec, and the two can be considered identical. It uses random walks and Skip Gram embeddings. The approach has been developed at the same time as RDF2vec. [Alsharani et al., 2017]
  • KG2vec pursues a similar idea as RDF2vec by first transforming the directed, labeled RDF graph into an undirected, unlabeled graph (using nodes for the relations) and then extracting walks from that transformed graph. [Wang et al., 2021] Although no direct comparison is available, we assume that the embeddings are comparable.
  • Wembedder is a simplified version of RDF2vec which uses the raw triples of a knowledge graph as input to the word2vec implementation, instead of random walks. It serves pre-computed vectors for Wikidata. [Nielsen, 2017]
  • KG2vec (not to be confused with the aforementioned approach also named KG2vec) follows the same idea of using triples as input to a Skip-Gram algorithm. [Soru et al., 2018]
  • Triple2Vec follows a similar idea of walk-based embedding generation, but embeds entire triples instead of nodes. [Fionda and Pirrò, 2020]
  • [Van and Lee, 2023] propose different extensions to creating walks for RDF2vec, including the usage of text literals by means of creating new graph nodes for similar text literals, as well as the introduction of latent walks which capture relations which are not explicit in the knowledge graph.
  • RDFstar2vec is an extension of RDF2vec which works on RDF-star graphs. It defines additional walk strategies for quoted triples. [Egami et al., 2023]

Natively, RDF2vec does not incorporate literals. However, they can be incorporated with a few tricks:

  • PyRDF2vec (see above) has an option which adds literals of entities as direct features, creating a heterogeneous feature vector consisting of the embedding dimensions and additional features from the literal values.
  • There are quite a few graph preprocessing operators which can be utilized to incorporate literals by representing their information in the form entities and relations, so that they are processed by RDF2vec (and other embedding methods). Even simple baselines which are efficient and do not increase the graph size can boost the performance of RDF2vec. [Preisner and Paulheim, 2023]

Initially, word2vec was created for natural language, which shows a bit of variety with respect to word ordering. In contrast, walks extracted from graphs are different.

Consider, for example, the case of creating embedding vectors for bread in sentences such as Tom ate bread yesterday morning and Yesterday morning, Tom ate bread. For walks extracted from graphs, however, it makes a difference whether a predicate appears before or after the entity at hand. Consider the example above, where all three entities in the middle (Angela_Merkel, Peter_Tschentscher, and Germany) share the same context items (i.e., Hamburg and leader). However, for the semantics of an entity, it makes a difference whether that entity is or has a leader.

RDF2vec always generates embedding vectors for an entire knowledge graph. In many practical cases, however, we only need vectors for a small set of target entities. In such cases, generating vectors for an entire large graph like DBpedia would not be a practical solution.

  • RDF2vec Light is an alternative which can be used in such scenarios. It only creates random walks on a subset of the knowledge graph and can produce embedding vectors for a target subset of entities fast. In many cases, the results are competitive with those achieved with embeddings of the full graph. [Portisch et al., 2020] Details about the implementation are found here.
  • LODVec uses the same mechanism as RDF2vec Light, but creates sequences across different datasets by exploiting owl:sameAs links, and unifying classes and predicates by exploiting owl:equivalentClass and owl:equivalentProperty definitions. [Mountantonakis and Tzitzikas, 2021]

RDF2vec can explicitly trade off similarity and relatedness.

One of the key findings of the comparison of RDF2vec to embedding approaches for link prediction, such as TransE, is that while embedding approaches for link prediction create an embedding space in which the distance metric encodes similarity of entities, the distance metric in the RDF2vec embedding space mixes similarity and relatedness [Portisch et al., 2022]. This behavior can be influenced by changing the walk strategy, thereby creating embedding spaces which explicitly emphasize similarity or relatedness. The corresponding walk strategies are called p-walks and e-walks. In the above example, a set of p-walks would be:

birthPlace -> country -> Germany            -> leader     -> birthPlace
country    -> leader  -> Angela_Merkel      -> birthPlace -> leader
residence  -> leader  -> Peter_Tschentscher -> residence  -> leader

Likewise, the set of e-walks would be

Peter_Tschentscher -> Hamburg -> Germany            -> Angela_Merkel -> Hamburg
Hamburg            -> Germany -> Angela_Merkel      -> Hamburg       -> Peter_Tschentscher
Angela_Merkel      -> Hamburg -> Peter_Tschentscher -> Hamburg	     -> Germany

It has been shown that embedding vectors computed based on e-walks create a vector space encoding relatedness, while embedding vectors computed based on p-walks create a vector space encoding similarity. [Portisch and Paulheim, 2022]

Besides e-walks and p-walks, the creation of walks is the aspect of RDF2vec which has been undergone the most extensive research so far. While the original implementation uses random walks, alternatives have been explored include:

  • The use of different heuristics for biasing the walks, e.g., prefering edges with more/less frequent predicates, prefering links to nodes with higher/lower PageRank, etc. An extensive study is available in [Cochez et al., 2017a].
  • Zhang et al. also propose a different weighting scheme based on Metropolis-Hastings random walks, which reduces the probability of transitioning to a node with high degree and aims at a more balanced distribution of nodes in the walks. [Zhang et al., 2022]
  • A similar approach is analyzed in [Al Taweel and Paulheim, 2020], where embeddings for DBpedia are trained with external edge weights derived from page transition probabilities in Wikipedia.
  • In [Vandewiele et al., 2020], we have analyzed different alternatives to using random walks, such as walk strategies with teleportation within communities. While random walks are usually a good choice, there are scenarios in which other walking strategies are superior.
  • In [Saeed and Prasanna, 2018], the identification of specific properties for groups of entities is discussed as a means to find task-specific edge weights.
  • Similarly, NESP computes semantic similarities between relations in order to create semantically coherent walks. Moreover, the approach foresees refining an existing embedding space by bringing more closely related entities closer together. [Chekol and Pirrò, 2020]
  • Mukherjee et al. [Mukherjee et al., 2019] also observe that biasing the walks with prior knowledge on relevant properties and classes for a domain can improve the results obtained with RDF2vec.
  • The ontowalk2vec approach [Gkotse, 2020] combines the random walk strategies of RDF2vec and node2vec, and trains a language model on the union of both walk sets.

Besides changing the walk creation itself, there are also approaches for incorporating additional information in the walks:

  • [Bachhofner et al., 2021] discuss the inclusion of metadata, such as provenance information, in the walks in order to improve the resulting embeddings.
  • [Pietrasik and Reformat, 2023] introduce a heuristic reduction based on probabilistic properties of the knowledge graph as a preprocessing step, so that a first version of the embedding can be computed on a reduced knowledge graph.

RDF2vec relies on the word2vec embedding mechanism. However, other word embedding approaches have also been discussed.

  • In his master's thesis, Agozzino discuss the usage of FastText and BERT in RDF2vec as an alternative to word2vec. His preliminary experiments suggest that FastText might be a superior alternative to word2vec. [Agozzino, 2021] The FastText variant is also available in the pyRDF2vec implementation.
  • KGlove adapts the GloVe algorithm [Pennington et al., 2014] for creating the embedding vectors [Cochez et al., 2017b]. However, KGlove does not use random walks, but derives the co-occurence matrix directly from the knowledge graph.

While the original RDF2vec approach is agnostic to the type of knowledge encoded in RDF, it is also possible to extend the approach to specific types of datasets.

To materialize or not to materialize? While it might look like a good idea to enrich the knowledge graph with implicit knowledge before training the embeddings, experimental results show that materializing implicit knowledge actually makes the resulting embedding worse, not better.

  • In [Iana and Paulheim, 2020], we have conducted a series of experiments training embeddings on DBpedia as is, vs. training embeddings on DBpedia with implicit knowledge materialized. In most settings, the results on downstream tasks get worse when adding implicit knowledge. Our hypothesis is that missing information in many knowledge graphs is not missing at random, but a signal of lesser importance, and that signal is canceled out by materialization. A similar observation was made by [Alsharani et al., 2017].

RDF2vec can only learn that two entities is similar based on signals that can co-appear in a graph walk. For that reason, it is, for example, impossible to learn that two entities are similar because they have an ingoing edge from an entity of the same type (see also the results on the DLCC node classification benchmark [Portisch and Paulheim, 2022]). Looking at the following triples:

:Germany	rdf:type	:EuropeanCountry .
:Germany	:capital	:Berlin .
:France		rdf:type	:EuropeanCountry .
:France		:capital	:Paris .
:Thailand	rdf:type	:AsianCountry .
:Thailand	:capital	:Bangkok .
		  
In this example, it is impossible for RDF2vec to learn that Berlin is more similar to Paris than to Bangkok, since the entities EuropeanCountry and AsianCountry never co-occur in any walk with the city entities. Therefore, injection structural information into RDF2vec may improve the results.

  • Liang et al. have proposed an approach for using such structural information by injecting them in the loss function of the downstream task (not the one used for training the embeddings per se). Their results show that the performance of entity classification with RDF2vec can be improved by adding a loss term based on structural similarities.

Knowledge Graphs usually do not contain negative statements. However, in cases where negative statements are present, there are different ways of handling them in the embedding creation.

  • One variant is the encoding of negative statements with specific relations, an approach which can be used with arbitrary embedding methods. When dealing with walk-based methods on large hierarchies, it is possible to encode the negative statements in the direction of walks along the hierarchy, as demonstrated in the TrueWalks approach in [Sousa et al., 2023].

Other Resources

Other useful resources for working with RDF2vec:

  • GEval is a Python-based framework to run evaluations of RDF2vec in the way of the above mentioned papers [Pellegrino et al., 2019, Pellegrino et al., 2020].
  • Concept2vec provides a test benchmark for analyzing how well RDF2vec embeddings encode ontological (i.e., schema-level) properties of a knowledge graph [Alshargi et al., 2019].
  • DLCC is another benchmark for analyzing which schema constructs can be learned by embedding models. It comes in two flavours, one based on the real-world knowledge graph DBpedia, another one based on synthetic data [Portisch and Paulheim, 2022].

Applications

RDF2vec has been used in a variety of applications. In the following, we list a number of those, organized by different fields of applications.

Knowledge Graph Refinement

Knowledge Graph Refinement subsumes the usage of embeddings for adding additional information to a knowledge graph (e.g., link/relation or type prediction), to extend its schema/ontology, or the identification (and potentially: correction) of existing facts in the graph [Paulheim, 2017]. In most of the applications, RDF2vec embedding vectors are used as representations for training a machine learning classifier for the task at hand, e.g., a predictive model for entity types. Applications in this area include:
  • TIEmb is an approach for learning subsumption relations using RDF2vec embeddings. [Ristoski et al., 2017] The use of RDF2vec for learning subsumptions is also discussed in [Gosselin and Zouaq, 2023] and [Shiraishi and Kaneiwa, 2024].
  • Kejriwal and Szekely discuss the use RDF2vec embeddings for entity type prediction in knowledge graphs. [Kejriwal and Szekely, 2017] Another approach in that direction is proposed by Sofronova et al., who contrast supervised and unsupervised methods for exploiting RDF2vec embeddings for type prediction. [Sofronova et al., 2020] Furthermore, the usage of RDF2vec for type prediction in knowledge graphs is discussed in [Weller, 2021], [Jain et al., 2021], and [Ugai, 2023]. [Cutrona et al., 2021] report that using RDF2vec embeddings for type prediction yields similarly scoring results as using BERT embeddings [Devlin et al., 2018] trained on the entities' textual abstracts. The combination of textual entity information, encoded with BERT, and graph information, encoded with RDF2vec, is discussed in [Biswas et al., 2022].
  • Daga and Groth also use RDF2vec to classify nodes in a knowledge graph extracted from Python notebooks on Kaggle. They show that the classification using RDF2vec significantly outperforms the usage of the pre-trained CodeBERTa model. [Daga and Groth, 2022]
  • Shahinmoghadam et al. discuss the use of RDF2vec embeddings for node classification in the building information modeling field. They show that the combination of dimensionality reduction of the embedding space using Kernel PCA and a downstream classifier yields the best results. [Shahinmoghadam et al., 2022]
  • GraphEmbeddings4DDI utilizes RDF2vec for predicting drug-drug interactions [Çelebi et al., 2018]. A similar system is introduced by Karim et al., using a complex LSTM on top of the entity embeddings generated with RDF2vec [Karim et al., 2019]. Since the drug-drug-interactions are modeled as relation in the knowledge graphs used for the experiments, this task is essentially a relation prediction task. [Zhang et al., 2022] also target the prediction of drug-drug-interaction and drug-target-interaction, using a combination of CNN and BiLSTM as a downstream prediction model.
  • Ammar and Celebi showcase the use of RDF2vec embeddings for the fact validation task at the 2019 edition of the Semantic Web Challenge. [Ammar and Celebi, 2019]. A similar approach is pursued by Pister and Atemezing [Pister and Atemezing, 2019]. Qudus et al. discuss a hybrid fact checking approach using text and knowledge graph embeddings. They show that hybrid approaches built with RDF2vec outperform those built on most other embedding techniques. [Qudus et al., 2022]
  • Chen et al. show that RDF2vec embeddings can be used for relation prediction and yields results competitive with TransE and DistMult [Chen et al., 2020].
  • Yao and Barbosa combines RDF2vec and outlier detection for detecting wrong type assertions in knowledge graphs [Yao and Barbosa, 2021].
  • Egami et al. utilize RDF2vec for clustering activities in a knowledge graph of daily living activities, and discuss the use of those clusters for refining the underlying ontology. [Egami et al., 2021]
  • [Heilig et al., 2022] use RDF2vec embedding on a biomedical knowledge graph for refining rules for medical diagnosis. This gives an interesting example for combining embeddings with explainable artificial intelligence: the embeddings are not used directly for prediction, but rather to refine interpretable rules, which are reviewed by medical experts.
  • [Gonzalez-Hevia and Gayo-Avello] propose the extraction of knowledge graphs containing not only the information about the entity itself, but also its edit history to improve type prediction. One of their approaches uses RDF2vec embeddings on those extracted knowledge graphs.
  • [Silva Neto, 2023] uses RDF2vec, among other representations, to cluster entities for class learning in knowledge graphs.
  • [Potoniec, 2020] uses RDF2vec representations of triples (by concatenating subject, predicate, and object vectors), together with an RNN, to predict characteristics of object properties, such as symmetry and transitivity.

Knowledge Matching and Integration

In knowledge matching and integration, entity embedding vectors are mostly utilized to determine whether two entities in two datasets are similar enough to each other to merge them into one. Different approaches have been proposed using RDF2vec for matching and integratino both on the schema as well as on the instance level:
  • MERGILO is a tool for merging structured knowledge extracted from text. A refinement of MERGILO using RDF2vec embeddings on FrameNet is discussed in [Alam et al., 2017].
  • EARL is a named entity linking tool which uses pre-trained RDF2vec embeddings. [Dubey et al., 2018]
  • ALOD2vec Matcher is an ontology matching system which uses pre-trained embeddings on the WebIsALOD knowledge graph to determine the similarity of two concepts. [Portisch and Paulheim, 2018]. The approach has later been extended to DBpedia, WordNet, Wikidata, Wiktionary, and BabelNet in [Portisch et al., 2021]. A similar approach is pursued by the DESKMatcher system, which uses domain specific embeddings from the business domain, e.g., the FIBO ontology [Monych et al., 2020].
  • AnyGraphMatcher is another ontology matching system which leverages RDF2vec embeddings trained on the two input ontologies to match [Lütke, 2019].
  • [Kardos and Farkos, 2022] use RDF2vec for knowledge graph matching, exploiting a linear transformation learned between the embedding spaces of the source and the target knowledge graph. [Happi et al., 2024 propose modeling the entity matching problem as a binary classification problem, concatenating two entity embeddings and predicting match/non-match as a target. Similarly, Azmy et al. use RDF2vec for entity matching across knowledge graphs, and show a large-scale study for matching DBpedia and Wikidata [Azmy et al., 2019]. A similar approach is introduced by Aghaei and Fensel, who combine RDF2vec embeddings with clustering and BERT sentence embeddings to identify related entities in two knowledge graphs [Aghaei and Fensel, 2021]. The combination of textual embeddings with BERT and RDF2vec embeddings is also discussed for ontology matching by [Mijalcheva et al., 2022].
  • DELV is an entity matching approach for matching multiple knowledge graphs built on top of RDF2vec. It first embeds a central knowledge graph using RDF2vec, and then performs an RDF2vec embedding of satellite knowledge graphs with a slightly modified word2vec loss function, taking the minimization of the distance of already matched anchors into account.[Ruppen, 2018].
  • In a showcase for the MELT ontology matching framework, Hertling et al. show that by learning a non-linear mapping between RDF2vec embeddings of different ontologies, ontology matching can be performed at least for structurally similar ontologies [Hertling et al., 2020]. [Portisch et al., 2022] show that this can also be achieved by rotation of embedding spaces. This is particularly remarkable since that metric measures similarity, not relatedness, which is actually needed for the task at hand.
  • [Cvetkov et al., 2022] use knowledge graph embeddings to perform table augmentation. They represent a set of tables as a knowledge graph and perform embeddings on top; RDF2vec is one of the methods tested for this task.

Applications in NLP

In natural language processing, knowledge graph embeddings are particularly handy in setups that already exploit knowledge graphs, for example, for linking entities in text to a knowledge graph using named entitiy linking and named entity disambiguation. Applications of RDF2vec in the NLP field include:
  • TREC CAR is a benchmark for complex answer retrieval. The authors use pre-trained RDF2vec embeddings as one means to represent queries and answers, and for matching them onto each other. [Nanni et al., 2017a]
  • Inan and Dikenelli demonstrate the usage of RDF2vec embeddings in named entity disambiguation in the entity disambiguation frameworks DoSeR and AGDISTIS. [Inan and Dikenelli, 2017]
  • In a later work, Inan and Dikelli propose the use of RDF2vec embeddings together with a BiLSTM and a CRF layer for entity disambiguation. [Inan and Dikenelli, 2018]
  • Wang et al. have used RDF2vec embeddings for analyzing entity co-occurence in tweets [Wang et al., 2017].
  • [Benitez-Andrades et al., 2022] consider the case of tweet classification, and show that by linking entities to Wikidata and using RDF2vec embeddings for those entities leads to better classification results than pure text-based approaches based on different BERT variants.
  • Nanni et al. showcase the use of RDF2vec embeddings for entity aspect linking in [Nanni et al., 2018].
  • Nizzoli et al. use RDF2vec, among other features, to perform named entity linking of geographic entities, in particular for scoring candidates. [Nizzoli et al., 2020]
  • KGA-CGM is a system for describing images with captions. It uses RDF2vec embeddings for handling out-of-training entities [Mogadala et al., 2018].
  • Türker discusses the use of RDF2vec for text categorization by embedding both texts and categories [Türker, 2019].
  • Vakulenko demonstrates the use of RDF2vec in dialogue systems [Vakulenko, 2019].
  • G-Rex is a tool for relation extraction from text which leverages RDF2vec entity embeddings [Ristoski et al., 2020].
  • El Vaigh et al. show that using cosine similarity in the RDF2vec space creates a strong baseline for collective entity linking [El Vaigh al., 2020]. This is particularly remarkable since that metric measures similarity, not relatedness, which is actually needed for the task at hand.
  • Yamada et al. also use RDF2vec for measuring entity relatedness, and contrast the results of RDF2vec trained on DBpedia to their model Wikipedia2vec. The results are close, with Wikipedia2vec yielding slightly better results, but also based on a model which is significantly larger than RDF2vec. [Yamada et al., 2020]
  • FinMatcher is a tool for named entity classification in the financial domain, developed for the FinSim-2 shared task. It uses pre-trained RDF2vec embeddings on WebIsALOD [Portisch et al., 2021]
  • [Eingleitner et al., 2021] use RDF2vec embeddings to provide semantic tags for news articles.
  • LamAPI is a service for entity retrieval as one step in the entity linking process. [Avogadro et al., 2022] use RDF2vec in this service to enhance the set of types for seed entities, and they show that an expansion based on RDF2vec helps in particular with entities with no or just one type assigned.
  • [Bagherzadeh and Bergler, 2022] combine pre-trained RDF2vec embeddings on various general purpose knowledge graphs (DBpedia, WordNet, and ConceptNet) with embeddings on biomedical knowledge graphs and BERT text embeddings on a variety of NLP tasks in the biomedical domain, including text classification and relation extraction. They show that a combination of BERT and knowledge graph embeddings outperforms a pure BERT based approach. Moreover, the paper interestingly demonstrates that embeddings on different knowledge graphs created with different embedding approaches can be combined.
  • [Chen, 2023] proposes the use of RDF2vec embeddings on a general purpose knowledge graph (here: YAGO2) as a signal for entity relatedness in text summarization.
  • [Setty, 2023] uses RDF2vec embeddings for clustering types in knowledge graphs as a preparing step for answer type prediction in question answering.

Information Retrieval

In information retrieval, similarity and relatedness of entities can be utilized to retrieve and/or rank results for queries for a given entity. Examples for the use of RDF2vec in the information retrieval field include:
  • Nanni et al. describe a system for harvesting event collections from Wikipedia, where RDF2vec is used internally for entity ranking. [Nanni et al., 2017b]
  • Ad Hoc Table Retrieval using Semantic Similarity describes the use of pre-trained RDF2vec embeddings for retrieving Wikipedia tables. [Zhang and Balog, 2018] In a later extension, they distinguish two kinds of retrieval tasks (using either keywords or tables as queries), and show that entity embeddings with RDF2vec can be used for both scenarios. [Zhang and Balog, 2021] Table annotation with RDF2vec is also discussed in [Cutrona et al., 2021], [Shigarov et al., 2021], [Dorodnykh and Yurin, 2023], and [Avogadro, 2024].
  • Cyber-all-intel is an application in the computer security domain. It uses RDF2vec vectors for retrieving information on security alerts [Mittal et al., 2019].
  • The COVID-19 literature knowledge graph is a large citation network of CoViD-19 related scientific publications, derived from the CORD-19 dataset. In [Steenwinckel et al., 2020], the authors exploit RDF2vec embeddings on that graph for facilitating the retrieval of related articles, as well as for clustering the large body of literature.
  • In the context of the Data Set Knowledge Graph, the retrieval of similar datasets has been discussed as a use case for RDF2vec. [Färber and Lamprecht, 2021].
  • Kim et al. discuss the use of RDF2vec on top of knowledge graphs created using open information extraction from text, in particular for retrieving similar entities to support situation awareness in combat situations. [Kim et al., 2021]
  • [Loesch et al., 2022] discuss the use of RDF2vec for retrieving substitutes for food ingredients in a food knowledge graph. They show that RDF2vec embeddings outperform TransE and ComplEx on this task.
  • ebay uses RDF2vec embeddings on product graphs for determining product similarity [Ristoski et al, 2023], as well as to retrieve products with similar colors refered to by different names (e.g., "graphite" for "grey"). [Liang et al., 2022]
  • Nordsieck et al. use RDF2vec to retrieve similar processes [Nordsieck et al., 2022] and similar quality characteristics [Nordsieck et al., 2023] from a knowledge graph encoding procedural knowledge in the industrial manufacturing domain.
  • [Schwabe and Acosta, 2023] combine RDF2vec embeddings with a graph neural network approach to estimate the cardinality of queries on a knowledge graph.
  • [Ekaputra et al., 2023] use RDF2vec on a knowledge graph of scientific papers and systems to identify related systems, datasets, or algorithms.
  • [Farzana et al., 2023] discuss the use case of query rewriting for product retrieval, using RDF2vec embeddings on a product knowledge graph, among other building blocks.
  • [Eschauzier et al., 2023] use RDF2vec embeddings to represent predicates in learning to optimize join orderings for SPARQL query execution.
  • [Luo et al., 2023] use RDF2vec embeddings of Wikidata for reranking search results in data set search.
  • Web API composition deals with the complex task of finding a set of APIs that fulfill a goal. In order to combine them, one needs to find matching APIs. [Boustil and Tabel, 2023] use RDF2vec embeddings of different knowledge graphs in order to find APIs with synonymous parameter names.

Predictive Modeling

Predictive modeling was the original use case for which RDF2vec was developed. Here, external variables (which might be continuous or categorical) are predicted for a set of entities. By linking these entities to a knowledge graph, entity embeddings have been shown to be suitable representations for the downstream predictive modeling tools. Examples in this field include:
  • Hascoet et al. show how to use RDF2vec for image classification, especially for classes of images for which no training data is available, i.e., zero-shot-learning. [Hascoet et al., 2017]
  • evoKGsim* combines similarity metrics and genetic programming for predicting out-of-KG relations. The framework implements RDF2vec as one source of similarity metrics. [Sousa et al., 2021]
  • Biswas et al. discuss the use of RDF2vec as a signal for predicting infobox types in Wikipedia articles [Biswas et al., 2018].
  • Egami et al. show the use case of geospatial data analytics in urban spaces by constructing a geospatial knowledge graph and computing RDF2vec embeddings thereon [Egami et al., 2018]. Another example of predictive modeling using geo-spatial knowledge graphs is given by [Böckling et al., 2023], where wildfires are predicted using a geo-spatial knowledge graph integrating various sources, and computing RDF2vec embddings thereon.
  • Hees discusses the use of pre-trained RDF2vec models for predicting human associations of terms [Hees, 2018].
  • The utilization of RDF2vec for content-based recommender systems is discussed in [Saeed and Prasanna, 2018], [Ristoski et al., 2019], [Voit and Paulheim, 2021], and [Hubert, 2023]. [Palumbo et al., 2019] report that RDF2vec performs better in terms of recommending novel items than other competitors. [Nguyen 2023] uses RDF2vec for recommending data items and visualizations for creating dashboards. The work is remarkable insofar that different embedding methods (RDF2vec and TransH) are combined for the recommendation.
  • Jurgovsky demonstrates the use of RDF2vec for data augmentation on the task of credit card fraud detection [Jurgovsky, 2019].
  • Hoppe et al. demonstrate the use of RDF2vec embeddings on DBpedia for improving the classification of scientific articles [Hoppe et al., 2021]. The approach was later also applied to classifying Wikipedia abstracts [Hoppe, 2022]. In particular, the authors suggest representing a texts as sequences of entities, which are then processed by a BiLSTM.
  • [Nunes et al., 2021] show how graph embeddings on biomedical ontologies can be utilized for predicting drug-gene-interactions. They train classifiers such as random forests over the concatenated embedding vectors of the drugs and genes. In a follow up work, they explore different mechanisms of combination beyond concatenation. [Nunes et al., 2023]
  • [Sousa et al., 2021] use embeddings on the Gene ontology for various predictive modeling tasks in the biomedical domain, including the prediction of proteins and the interaction of diseases and genes, as well as the analysis of protein-protein interactions [Sousa et al., 2024]. In later work [Nunes et al., 2023], they show how using aggregates of embeddings of ancestor nodes can help producing explanations for the embedding-based predictions.
  • Wang et al. use embeddings, including RDF2vec, to assess the similarity of proteins in the Gene Ontology [Wang et al., 2022].
  • Ramezani et al. represent essays by knowledge graphs, and use embeddings of the concepts in those graphs to predict the author's personality in the big 5 model based on their written essay. [Ramezani et al., 2022]
  • [Carvalho et al., 2022] use RDF2vec embeddings on an ontology-enriched variant of the MIMIC III dataset, a database of hospital patient data, to predict patient readmission to intensive care units.
  • [Vliestra et al., 2022] apply RDF2vec on a biomedical knowledge graph to identify genetic markers associated with diseases. They show that RDF2vec does not only outperform other graph embedding methods, but also state of the art reference methods in the field.
  • [Pellegrini, 2021] uses RDF2vec embeddings for classifying nodes in different knowledge graphs, mostly for predicting the gender of humans.
  • [Chiatta and Dagi, 2022] show how RDF2vec based embeddings can be used as an additional signal in predicting artwork subjects. They combine image, text, and knowledge graph embeddings and show that those combinations often outperform purely visual classification.
  • [Lazzari, 2022] uses RDF2vec to classify chords in music, which are arranged in a knowledge graph of music chords. The work shows that RDF2vec on that graph outperforms tailored models like chord2vec, intervals2vec, and pitchclass2vec.
  • [Van der Weerdt et al., 2023] discuss the use of RDF2vec for node classification in IoT settings. Since IoT knowledge graphs contain lots of numerical measurements, they also demonstrate an effective way of preprocessing and enriching the graph before embedding.
  • [Tailhardat et al., 2023] use RDF2vec and RandomForests to classify incidents in ICT knowledge graphs. The classification is similar to a node classification task.
  • Ugai et al. build a knowledge graph of daily living activities and propose the use of RDF2vec for detecting hazardous situations. [Ugai et al., 2024]
  • [Katili et al., 2024] integrate various data sources about insects and plants into a knowledge graph, and use RDF2vec embeddings on that graph to predict the transmission of plant viruses by insects.

Other Applications

No matter how sophisticated your categorization schema is, you always end up with a category called "other" or "misc.". Here are examples for applications of RDF2vec in that category:
  • REMES is an entity summarization approach which uses RDF2vec to select a suitable subset of statements for describing an entity. [Gunaratna et al., 2017] Other approaches proposing the usage of RDF2vec for entity summarization are discussed in [Li et al., 2020] and [Horlyk, 2023].
  • Similar to that, Shi et al. propose an approach for extracting semantically coherent subgraphs from a knowledge graph, which uses RDF2vec as a measure for semantic distance to guarantee semantic coherence. [Shi et al., 2021]
  • Jurisch and Igler demonstrate that utilization of RDF2vec embeddings for detecting changes in ontologies in [Jurisch and Igler, 2018].
  • Niazmand et al. use of RDF2vec embeddings for identifying similar predicates for summarizing knowledge graphs [Niazmand et al., 2022]. That approach for identifying of similar predicates is also discussed by the authors for improving query processing over Wikidata [Niazmand et al., 2023]
  • Sultana et al. combine RDF2vec with Graph Convolutional Neural Networks to achieve knowledge graph compression. Interestingly, their results indicate that an encoder solely built on RDF2vec (without the convolutional layer) can already achieve state of the art results in knowledge graph compression. [Sultana et al., 2024].
  • Abe et al. propose the use of RDF2vec vectors to identify devices in the physical neighborhood of a user in an IoT scenario [Abe et al., 2022].
  • Wang et al. use RDF2vec to identify semantically similar statements (receiving a statement vector by concatenating subject, predicate, and object vectors) for creating semantically coherent subgraphs of inconsistent ontologies. [Wang et al., 2023].

References

These are the core publications of RDF2vec:

  1. Heiko Paulheim, Petar Ristoski, Jan Portisch: Embedding Knowledge Graphs with RDF2vec. Springer, 2023.
  2. Jan Portisch, Heiko Paulheim: The RDF2vec Family of Knowledge Graph Embedding Methods. Semantic Web Journal, 2023.
  3. Petar Ristoski, Jessica Rosati, Tommaso Di Noia, Renato De Leone, Heiko Paulheim: RDF2Vec: RDF Graph Embeddings and Their Applications. Semantic Web Journal 10(4), 2019.
  4. Petar Ristoski, Heiko Paulheim: RDF2Vec: RDF Graph Embeddings for Data Mining. International Semantic Web Conference, 2016.

Further references used above:

  1. Shinya Abe, Shoko Fujii, Tatsuya Sato, Yuto Komatsu, Satoshi Fujitsu, Hiroshi Fujisawa: Semantic Force-directed Device Selection for Notification. Annual Conference of the Society of Instrument and Control Engineers (SICE), 2022.
  2. Sareh Aghaei, Anna Fensel: Finding Similar Entities Across Knowledge Graphs. International Conference on Advances in Computer Science and Information Technology, 2021.
  3. Terencio Agozzino: A Trip to Sesame Street: Evaluation of BERT and Other Recent Embedding Techniques Within RDF2Vec. Master's thesis, 2021.
  4. Mehwish Alam, Diego Reforgiato Recupero, Misael Mongiovi, Aldo Gangemi, Petar Ristoski: Event-based knowledge reconciliation using frame embeddings and frame similarity. Knowledge-based Systems (135), 2017
  5. Mona Alshahrani, Mohammad Asif Khan, Omar Maddouri, Akira R Kinjo, Núria Queralt-Rosinach, Robert Hoehndorf: Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 33(17), 2017.
  6. Faisal Alshargi, Saeedeh Shekarpour, Tommaso Soru, Amit Sheth: Concept2vec: Metrics for Evaluating Quality of Embeddings for Ontological Concepts. Spring Symposium on Combining Machine Learning with Knowledge Engineering, 2019
  7. Ahmad Al Taweel, Heiko Paulheim: Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings. Deep Learning for Knowledge Graphs Workshop, 2020
  8. Ammar Ammar, Remzi Celebi: Fact Validation with Knowledge Graph Embeddings. International Semantic Web Conference, 2019
  9. Roberto Avogadro, Marco Cremaschi, Fabio D’adda, Flavio De Paoli, and Matteo Palmonari: LamAPI: a Comprehensive Tool for String-based Entity Retrieval with Type-base Filters. Workshop on Ontology Matching, 2022.
  10. Roberto Avogadro: Semantic Enrichment of Tabular Data with Machine Learning Techniques. Università degli Studi di Milano Bicocca, 2024.
  11. Michael Azmy, Peng Shi, Jimmy Lin, Ihab F. Ilyas: Matching Entities Across Different Knowledge Graphs with Graph Embeddings. arxiv.org, 2019
  12. Stefan Bachhofner, Peb Ruswono Aryan, Bernhard Krabina, Robert David: Embedding Metadata-Enriched Graphs. International Semantic Web Conference, Posters and Demos, 2021.
  13. Parsa Bagherzadeh and Sabine Bergler: Integration of Heterogeneous Knowledge Sources for Biomedical Text Processing. International Workshop on Health Text Mining and Information Analysis, 2022.
  14. Jose Alberto Benitez-Andrades, Maria Teresa García-Ordás, Mayra Russo, Ahmad Sakor, Luis Daniel Fernandes Rotger, María-Esther Vidal: What Can Tweets and Knowledge Graphs Tell Us About Eating Disorders?. Semantic Web Journal, 2022.
  15. Russa Biswas, Rima Türker, Farshad Bakhshandegan-Moghaddam, Maria Koutraki, Harald Sack: Wikipedia Infobox Type Prediction Using Embeddings. Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies, 2018
  16. Russa Biswas, Jan Portisch, Heiko Paulheim, Harald Sack, Mehwish Alam: Entity Type Prediction Leveraging Graph Walks and Entity Descriptions. In: International Semantic Web Conference, 2022.
  17. Martin Böckling, Heiko Paulheim, Sarah Detzler: Wildfire Prediction Using Spatio-Temporal Knowledge Graphs. Second International Workshop on Linked Data-driven Resilience Research, 2023.
  18. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, Oksana Yakhnenko: Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).
  19. Amel Boustil, Youcef Tabet: Web APIs composition based on Knowledge Graph Word Embedding and Entity Linking. International Conference on Pattern Analysis and Intelligent Systems (PAIS), 2023.
  20. Remzi Çelebi, Erkan Yaşar, Hüseyin Uyar, Özgür Gümüş, Oguz Dikenelli, Michel Dumontier: Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction using Linked Open Data. International Conference Semantic Web Applications and Tools for Life Sciences, 2018
  21. Ricardo Miguel Serafim Carvalho, Catia Pesquita, Daniela Oliveira: Knowledge Graph Embeddings for ICU readmission prediction. Research Square, 2022.
  22. Melisachew Wudage Chekol, Giuseppe Pirrò: Refining Node Embeddings via Semantic Proximity. In: International Semantic Web Conference, 2020.
  23. Jiaoyan Chen, Xi Chen, Ian Horrocks, Erik B. Myklebust, Ernesto Jiménez-Ruiz: Correction Knowledge Base Assertions. The Web Conference, 2020
  24. Jiaoyan Chen, Pan Hu, Ernesto Jimenez-Ruiz, Ole Magnus Holter, Denvar Antonyrajah, Ian Horrocks: OWL2Vec*: Embedding of OWL Ontologies. arxiv.org, 2020
  25. Jingqiang Chen: An entity-guided text summarization framework with relational heterogeneous graph neural network. arxiv.org, 2023
  26. Agnese Chiatti and Enrico Daga: Neuro-symbolic learning for dealing with sparsity in cultural heritage image archives: an empirical journey. Workshop on Deep Learning for Knowledge Graphs (DL4KG), 2022
  27. Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, Heiko Paulheim: Biased Graph Walks for RDF Graph Embeddings. International Conference on Web Intelligence, Mining, and Semantics, 2017
  28. Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, Heiko Paulheim: Global RDF Vector Space Embeddings. International Semantic Web Conference, 2017
  29. Vincenzo Cutrona, Gianluca Puleri, Federico Bianchi, Matteo Palmonari: NEST: Neural Soft Type Constraints to Improve Entity Linking in Tables. Semantics, 2021.
  30. Alexis Cvetkov-Iliev, Alexandre Allauzen, Gael Varoquaux: Relational Data Embeddings for Feature Enrichment with Background Information. Machine Learning, 2022.
  31. Enrico Daga, Paul Groth: Data journeys: knowledge representation and extraction. Under review at Semantic Web Journal.
  32. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv.org, 2018.
  33. Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, Jens Lehmann: EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs. International Semantic Web Conference, 2018
  34. Nikita Dorodnykh, Aleksandr Yurin: Knowledge Graph Engineering Based on Semantic Annotation of Tables. Computation 11(9), 2023.
  35. Shusaku Egami, Takahiro Kawamura, Akihiko Ohsuga: Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods. Joint International Semantic Technology Conference, 2018
  36. Shusaku Egami, Satoshi Nishimura, Ken Fukuda: A Framework for Constructing and Augmenting Knowledge Graphs using Virtual Space: Towards Analysis of Daily Activities. In: International Conference on Tools for Artificial Intelligence, 2021
  37. Shusaku Egami, Takanori Ugai, Masateru Oota, Kyoumoto Matsushita, Takahiro Kawamura, Kouji Kozaki, Ken Fukuda: RDF-star2Vec: RDF-star Graph Embeddings for Data Mining. In: IEEE Access, 2023
  38. Fajar J. Ekaputra, Majlinda Llugiqi, Marta Sabou, Andreas Ekelhart, Heiko Paulheim, Anna Breit, Artem Revenko, Laura Waltersdorfer, Kheir Eddine Farfar, Sören Auer: Describing and Organizing Semantic Web and Machine Learning Systems in the SWeMLS-KG. Extended Semantic Web Conference, 2023.
  39. Cheikh-Brahim El Vaigh, François Goasdoué, Guillaume Gravier, Pascale Sébillot: A Novel Path-based Entity Relatedness Measure for Efficient Collective Entity Linking. In: International Semantic Web Conference, 2020
  40. Nora Engleitner, Werner Kreiner, Nicole Schwarz, Theodorich Kopetzky, Lisa Ehrlinger: Knowledge Graph Embeddings for News Article Tag Recommendation. Semantics, 2021.
  41. Ruben Eschauzier, Ruben Taelman, Meike Morren and Ruben Verborgh: Reinforcement Learning-based SPARQL Join Ordering Optimizer. Extended Semantic Web Conference, 2023.
  42. Michael Färber: The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. International Semantic Web Conference, 2019
  43. Michael Färber, David Lamprecht: The Data Set Knowledge Graph: Creating a Linked Open Data Source for Data Sets. Quantitative Science Studies, 2021.
  44. Shahla Farzana, Qunzhi Zhou, Petar Ristoski: Knowledge Graph-Enhanced Neural Query Rewriting. WWW '23 Companion: Companion Proceedings of the ACM Web Conference, 2023.
  45. Valeria Fionda, Guiseppe Pirrò: Triple2Vec: Learning Triple Embeddings from Knowledge Graphs. AAAI Conference on Artificial Intelligence, 2020.
  46. Blerina Gkotse: Ontology-based Generation of Personalised Data Management Systems: an Application to Experimental Particle Physics. PhD Thesis at MINES ParisTech, 2020.
  47. Alejandro Gonzalez-Hevia, Daniel Gayo-Avello: Leveraging Wikidata's edit history in knowledge graph refinement tasks, arxiv.org, 2022.
  48. Francis Gosselin, Amal Zouaq: SORBET: A Siamese Network for Ontology Embeddings Using a Distance-Based Regression Loss and BERT. International Semantic Web Conference, 2023.
  49. Aditya Grover and Jure Leskovec: node2vec: Scalable Feature Learning for Networks.ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.
  50. Kalpa Gunaratna, Amir Hossein Yazdavar, Krishnaprasad Thirunarayan, Amit Sheth, Gong Cheng: Relatedness-based Multi-Entity Summarization. International Joint Conference on Artificial Intelligence, 2017
  51. Bill Gates Happi Happi, Géraud Fokou Pelap, Danai Symeonidou, Pierre Larmande: Pushing the Boundaries: Classification of Entity Alignment from RDF Embeddings. Semantic Web Journal, 2024.
  52. Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi: Semantic Web and Zero-Shot Learning of Large Scale Visual Classes. International Workshop on Symbolic-Neural Learning, 2017
  53. Jörn Hees: Simulating Human Associations with Linked Data. University of Kaiserslautern, 2018
  54. Niclas Heilig, Jan Kirchhoff, Florian Stumpe, Joan Plepi, Lucie Flek, Heiko Paulheim: Refining Diagnosis Paths for Medical Diagnosis based on an Augmented Knowledge Graph. Workshop on Semantic Web solutions for large-scale biomedical data analytics, 2022.
  55. Sven Hertling, Jan Portisch, Heiko Paulheim: Supervised Ontology and Instance Matching with MELT. Ontology Matching, 2020.
  56. Ole Magnus Holter, Erik B. Myklebust, Jiaoyan Chen, Ernesto Jimenez-Ruiz: Embedding OWL Ontologies with OWL2Vec. International Semantic Web Conference, 2019
  57. Fabian Hoppe, Danilo Dessì, Harald Sack: Deep Learning meets Knowledge Graphs for Scholarly Data Classification. Companion Proceedings of the Web Conference, 2021.
  58. Fabian Hoppe: Improving Zero-Shot Text Classification with Graph-based Knowledge Representations. Doctoral Consortium at International Semantic Web Conference, 2022
  59. Alva Marie Hørlyk: Snippet Generation with Reasoning and Embedding Techniques. University of Oslo, 2023.
  60. Gustav Hubert: Towards a Universal Recommender System: A Linked Open Data Approach. University of Utrecht, 2023.
  61. Nicolas Hubert, Heiko Paulheim, Armelle Brun, Davy Monticolo: Do Similar Entities have Similar Embeddings? arxiv.org, 2023.
  62. Andreea Iana, Heiko Paulheim: More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings. Combining Symbolic and Sub-symbolic methods and their Applications (CSSA), 2020
  63. Emrah Inan, Oguz Dikenelli: Effect of Enriched Ontology Structures on RDF Embedding-Based Entity Linking. Metadata and Semantic Research, 2017
  64. Emrah Inan, Oguz Dikenelli: A Sequence Learning Method for Domain-Specific Entity Linking. Named Entities Workshop, 2018.
  65. Nitisha Jain, Jan-Christoph Kalo, Wolf-Tilo Balke, Ralf Krestel: Do Embeddings Actually Capture Knowledge Graph Semantics?. Extended Semantic Web Conference, 2021
  66. Johannes Jurgovsky: Context-Aware Credit Card Fraud Detection. University of Passau, 2019
  67. Matthias Jurisch, Bodo Igler: RDF2Vec-based Classification of Ontology Alignment Changes. Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies, 2018
  68. Péter Kardos, Richárd Farkas: RDF2VEC in the Knowledge Graph matching task. 13th Conference of PhD Students in Computer Science, 2022.
  69. Md Rezaul Karim, Michael Cochez, Joao Bosco Jares, Mamtaz Uddin, Oya Beyan, Stefan Decker: Drug-drug interaction prediction based on knowledge graph embeddings and convolutional-LSTM network. ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 2019
  70. Moh. Zulkifli Katili, Yeni Herdiyeni, Medria Kusuma Dewi Hardhienata: Leveraging Biotic Interaction Knowledge Graph and Network Analysis to Uncover Insect Vectors of Plant Virus. Journal of Information Systems Engineering and Business Intelligence, 2024.
  71. Mayank Kejriwal, Pedro Szekely: Supervised Typing of Big Graphs using Semantic Embeddings. International Workshop on Semantic Big Data, 2017
  72. Sang-Min Kim, So-yeon Jin, Woo-sin Lee: A study on the Extraction of Similar Information using Knowledge Base Embedding for Battlefield Awareness. Journal of The Korea Society of Computer and Information, 2021
  73. Nicolas Lazzari: Knowledge-based Chord Embeddings. Dissertation at University of Bologna, 2022
  74. Junyou Li, Gong Cheng, Qingxia Liu, Wen Zhang, Evgeny Kharlamov, Kalpa Gunaratna, Huajun Chen: Neural Entity Summarization with Joint Encoding and Weak Supervision. International Joint Conference on Artificial Intelligence, 2020.
  75. Ke Liang, Yue Liu, Sihang Zhou, Xinwang Liu, Wenxuan Tu: Relational Symmetry based Knowledge Graph Contrastive Learning. arxiv.org, 2022.
  76. Lizzie Liang, Sneha Kamath, Petar Ristoski, Qunzhi Zhou, Zhe Wu: Fifty Shades of Pink: Understanding Color in e-commerce using Knowledge Graphs. International Conference on Information and Knowledge Management, 2022.
  77. Wang Ling, Chris Dyer, Alan W. Black, Isabel Trancoso: Two/Too Simple Adaptations of Word2Vec for Syntax Problems. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015.
  78. Julie Loesch, Louis Meeckers, Ilse van Lier, Alie de Boer, Michel Dumontier, Remzi Celebi: Automated Identification of Food Substitutions Using Knowledge Graph Embeddings. International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, 2022.
  79. Alexander Lütke: AnyGraphMatcher Submission to the OAEI Knowledge Graph Challenge 2019. International Workshop on Ontology Matching, 2019
  80. Weiqing Luo, Qiaosheng Chen, Zhiyang Zhang, Zixian Huang, Gong Cheng: An Empirical Investigation of Implicit and Explicit Knowledge-Enhanced Methods for Ad Hoc Dataset Retrieval. EMNLP Findings, 2023.
  81. Viktorija Mijalcheva, Ana Davcheva, Sasho Gramatikov, Milos Jovanovik, Dimitar Trajanov, Riste Stojanov: Learning Robust Food Ontology Alignment. IEEE International Conference on Big Data, 2022.
  82. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations, 2013
  83. Sudip Mittal, Anupam Joshi, Tim Finin: Cyber-All-Intel: An AI for Security related Threat Intelligence. arxiv.org, 2019
  84. Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger: Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects. Extended Semantic Web Conference, 2018
  85. Michael Monych, Jan Portisch, Michael Hladik, Heiko Paulheim: DESKMatcher. Ontology Matching, 2020
  86. Michalis Mountantonakis, Yannis Tzitzikas: Applying Cross-Dataset Identity Reasoning for Producing URI Embeddings over Hundreds of RDF Datasets, Journal of Metadata, Semantics and Ontologies, 2021
  87. Sourav Mukherjee, Tim Oates, Ryan Wright: Graph Node Embeddings using Domain-Aware Biased Random Walks. arxiv.org, 2019
  88. Federico Nanni, Bhaskar Mitra, Matt Magnusson, Laura Dietz: Benchmark for Complex Answer Retrieval. ACM International Conference on the Theory of Information Retrieval, 2017
  89. Federico Nanni, Simone Paolo Ponzetto, Laura Dietz: Building Entity-Centric Event Collections. ACM/IEEE Joint Conference on Digital Libraries, 2017
  90. Federico Nanni, Simone Paolo Ponzetto, Laura Dietz: Entity-aspect linking: providing fine-grained semantics of entities in context. International Joint Conference on Digital Libraries, 2018
  91. Chau Ngoc Minh Nguyen: Recommender system for data visualization and automated dashboard generation. Master's thesis at Ghent university, 2023.
  92. Emetis Niazmand, Gezim Sejdiu, Damien Graux, Maria-Esther Vidal: Efficient semantic summary graphs for querying large knowledge graphs. International Journal of Information Management Data Insights 2(1), 2022.
  93. Emetis Niazmand and Maria-Esther Vidal: SAP-KG: Analysis of Synonym Predicates using Wikidata. Semantic Web Journal, 2023.
  94. Maximilian Nickel, Volker Tresp, Hans-Peter Kriegel: A Three-Way Model for Collective Learning on Multi-Relational Data. In: International Conference on Machine Learning, 2011.
  95. Finn Årup Nielsen: Wembedder: Wikidata entity embedding web service. arxiv.org, 2017
  96. Leonardo Nizzoli, Marco Avvenuti, Maurizio Tesconi, Stefano Cresci: Geo-Semantic-Parsing: AI-powered geoparsing by traversing semantic knowledge graphs. Decision Support Systems, Volume 136, September 2020.
  97. Richard Nordsieck, Michael Heider, Anton Hummel, Jörg Hähner: A Closer Look at Sum-based Embedding Aggregation for Knowledge Graphs Containing Procedural Knowledge. Workshop on Deep Learning for Knowledge Graphs, 2022.
  98. Richard Nordsieck, André Schweizer, Michael Heider, Jörg Hähner: PDPK: A Framework to Synthesise Process Data and Corresponding Procedural Knowledge for Manufacturing. arxiv.org, 2023.
  99. Susana Nunes, Rita T. Sousa, Catia Pesquita: Predicting Gene-Disease Associations with Knowledge Graph Embeddings over Multiple Ontologies. arxiv.org, 2021.
  100. Susana Nunes, Rita T. Sousa, Catia Pesquita: Multi-domain knowledge graph embeddings for gene-disease association prediction. Journal of Biomedical Semantics, 2023.
  101. Enrico Palumbo, Alberto Buzio, Andrea Gaiardo, Giuseppe Rizzo, Raphael Troncy, Elena Baralis: Tinderbook: Fall in Love with Culture. Extended Semantic Web Conference, 2019.
  102. Paulheim, H. (2017): Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8(3), 2017.
  103. Giovanni Pellegrini: Relational Learning approaches for Recommender Systems. PhD Thesis, University of Trento, 2021.
  104. Maria Angela Pellegrino, Michael Cochez, Martina Garofalo, Petar Ristoski: A Configurable Evaluation Framework for Node Embedding Techniques. Extended Semantic Web Conference, 2019
  105. Maria Angela Pellegrino, Abdulrahman Altabba, Martina Garofalo, Petar Ristoski, Michael Cochez: GEval: A Modular and Extensible Evaluation Framework for Graph Embedding Techniques. Extended Semantic Web Conference, 2020
  106. Jeffrey Pennington, Richard Socher, Christopher D. Manning: GloVe: Global Vectors for Word Representation. Empirical Methods in Natural Language Processing, 2014
  107. Bryan Perozzi, Bryan, Rami Al-Rfou, Steven Skiena: Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
  108. Marcin Pietrasik, Marek Z. Reformat: Probabilistic Coarsening for Knowledge Graph Embeddings. In: Axioms, 2023.
  109. Alexis Pister, Ghislain Atemezing: Knowledge Graph Embedding for Triples Fact Validation. International Semantic Web Conference, 2019
  110. Jan Portisch and Heiko Paulheim: ALOD2vec Matcher. International Workshop on Ontology Matching, 2018
  111. Jan Portisch, Michael Hladik, Heiko Paulheim: KGvec2go - Knowledge Graph Embeddings as a Service. International Conference on Language Resources and Evaluation, 2020
  112. Jan Portisch, Michael Hladik, Heiko Paulheim: RDF2Vec Light – A Lightweight Approach for Knowledge Graph Embeddings. International Semantic Web Conference, Posters and Demos, 2020.
  113. Jan Portisch, Michael Hladik, Heiko Paulheim: FinMatcher at FinSim-2: Hypernym Detection in the Financial Services Domain using Knowledge Graphs. Workshop on Financial Technology on the Web (FinWeb), 2021.
  114. Jan Portisch, Michael Hladik, Heiko Paulheim: Background Knowledge in Schema Matching: Strategy vs. Data. International Semantic Web Conference, 2021.
  115. Jan Portisch, Heiko Paulheim: Putting RDF2vec in Order. International Semantic Web Conference, Posters and Demos, 2021.
  116. Jan Portisch, Nicolas Heist, Heiko Paulheim: Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the same Coin?. Semantic Web Journal 13(3), 2022.
  117. Jan Portisch, Heiko Paulheim: Walk this Way! Entity Walks and Property Walks for RDF2vec. ESWC Posters and Demos, 2022.
  118. Jan Portisch, Guilherme Costa, Karolin Stefani, Katharina Kreplin, Michael Hladik, Heiko Paulheim: Ontology Matching Through Absolute Orientation of Embedding Spaces. ESWC Posters and Demos, 2022.
  119. Jan Portisch, Heiko Paulheim: The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings. International Semantic Web Conference, 2022.
  120. Jędrzej Potoniec: Learning OWL 2 Property Characteristics as an Explanation for an RNN. Bulletin of the Polish Academy of Sciences Technical Sciences, 2020.
  121. Patryk Preisner, Heiko Paulheim: Universal Preprocessing Operators for Embedding Knowledge Graphs with Literals. Workshop on Deep Learning for Knowledge Graphs, 2023.
  122. Umair Qudus, Michael Röder, Muhammad Saleem and Axel-Cyrille Ngonga Ngomo: HybridFC: A hybrid approach to perform Fact Checking over Knowledge Graphs. International Semantic Web Conference, 2022.
  123. Majid Ramezani, Mohammad-Reza Feizi-Derakhshi, Mohammad-Ali Balafar: Knowledge Graph-Enabled Text-Based Automatic Personality Prediction. arxiv.org., 2022.
  124. Petar Ristoski, Stefano Faralli, Simone Paolo Ponzetto, Heiko Paulheim: Large-scale taxonomy induction using entity and word embeddings. International Conference on Web Intelligence, 2017
  125. Petar Ristoski: Exploiting Semantic Web Knowledge Graphs in Data Mining. IOS Press, Studies on the Semantic Web (38), 2019
  126. Petar Ristoski, Anna Lisa Gentile, Alfredo Alba, Daniel Gruhl, Steven Welch: Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop. Semantic Web Journal (60), 2020
  127. Petar Ristoski, Sathish Kandasamy, Aleksandr Matiushkin, Sneha Kamath, Qunzhi Zhou: Wisdom of the Sellers: Mining Seller Data for eCommerce Knowledge Graph Generation. In: Extended Semantic Web Conference, 2023.
  128. Leon Ruppen: Dependent Learning of Entity Vectors for Entity Alignment on Knowledge Graphs. Master's Thesis at ETH Zurich, 2018.
  129. Muhammad Rizwan Saeed, Viktor K. Prasanna: Extracting Entity-Specific Substructures for RDF Graph Embedding. IEEE International Conference on Information Reuse and Integration, 2018
  130. Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling: Modeling Relational Data with Graph Convolutional Networks. Extended Semantic Web Conference, 2018.
  131. Tim Schwabe, Maribel Acosta: Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural Networks. arxiv.org, 2023.
  132. Shahinmoghadam M., Motamedi A., Soltani M.M.: Leveraging Textual Information for Knowledge Graph-oriented Machine Learning: A Case Study in the Construction Industry. International Workshop on Intelligent Computing in Engineering (EG-ICE), 2022.
  133. Vinay Setty: Extreme Classification for Answer Type Prediction in Question Answering. arxiv.org, 2023.
  134. Yuxuan Shi, Gong Cheng, Trung-Kien Tran, Evgeny Kharlamov, Yulin Shen: Efficient Computation of Semantically Cohesive Subgraphs for Keyword-Based Knowledge Graph Exploration. The Web Conference, 2021.
  135. Alexey Shigarov, Nikita Dorodnykh, Alexander Yurin, Andrey Mikhailov and Viacheslav Paramonov: From web-tables to a knowledge graph: prospects of an end-to-end solution. Scientific-practical Workshop Information Technologies: Algorithms, Models, Systems, 2021
  136. Yukihiro Shiraishi, Ken Kaneiwa: A Self-matching Training Method with Annotation Embedding Models for Ontology Subsumption Prediction. arxiv.org, 2024.
  137. Everaldo Costa Silva Neto: Discovering a domain-specific schema from general-purpose knowledge base. Universidade Federal de Pernambuco, Recife, 2023.
  138. Fatima Zohra Smaili, Xin Gao, Robert Hoehndorf: Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations. Bioinformatics 34(13), 2018.
  139. Radina Sofronova, Russa Biswas, Mehwish Alam, Harald Sack: Entity Typing based on RDF2Vec using Supervised and Unsupervised Methods. Extended Semantic Web Conference, 2020.
  140. Tommaso Soru, Stefano Ruberto, Diego Moussallem, Edgard Marx, Diego Esteves, Axel-Cyrille Ngonga Ngomo: Expeditious Generation of Knowledge Graph Embeddings. European Conference on Data Analysis, 2018
  141. Rita T. Sousa, Sara Silva, Catia Pesquita: evoKGsim*: a framework for tailoring Knowledge Graph-based similarity forsupervised learning. OpenReview, 2021.
  142. Rita T. Sousa, Sara Silva, Catia Pesquita: Supervised Semantic Similarity. bioRxiv, 2021.
  143. Rita T. Sousa, Sara Silva, Catia Pesquita: Explainable Representations for Relation Prediction in Knowledge Graphs. arxiv.org, 2023.
  144. Rita T. Sousa, Sara Silva, Heiko Paulheim, Catia Pesquita: Biomedical Knowledge Graph Embeddings with Negative Statements. International Semantic Web Conference, 2023.
  145. Rita T. Sousa, Sara Silva, Catia Pesquita: Explaining protein–protein interactions with knowledge graph-based semantic similarity. Computers in Biology and Medicine, 2024.
  146. Bram Steenwinckel, Gilles Vandewiele, Ilja Rausch, Pieter Heyvaert, Ruben Taelman, Pieter Colpaert, Pieter Simoens, Anastasia Dimou, Filip De Turck, Femke Ongenae: Facilitating the Analysis of COVID-19 Literature Through a Knowledge Graph. International Semantic Web Conference, 2020.
  147. Tangina Sultana, Md. Delowar Hossain, Md Golam Morshed, Tariq Habib Afridi, Young-Koo Lee: Inductive Autoencoder for Efficiently Compressing RDF Graphs. Information Sciences, Vol. 662, 2024.
  148. Lionel Tailhardat, Raphaël Troncy, Yoan Chabot: Leveraging Knowledge Graphs For Classifying Incident Situations in ICT Systems. Proceedings of the 18th International Conference on Availability, Reliability and Security, 2023.
  149. Rima Türker: Knowledge-Based Dataless Text Categorization. Extended Semantic Web Conference, 2019
  150. Takanori Ugai: Comparison of Walk and Algebraic Model Embeddings in Event-Centric Knowledge Graphs. J-STAGE; 2023.
  151. Takanori Ugai, Shusaku Egami, Swe Nwe Nwe Htun, Kouji Kozaki, Takahiro Kawamura, Ken Fukuda: Synthetic Multimodal Dataset for Empowering Safety and Well-being in Home Environments. arxiv.org, 2024.
  152. Duong Thi Thu Van, Young-Koo Lee: A similar structural and semantic integrated method for RDF entity embedding. Applied Intelligence, 2023.
  153. Gilles Vandewiele, Bram Steenwinckel, Pieter Bonte, Michael Weyns, Heiko Paulheim, Petar Ristoski, Filip De Turck, Femke Ongenae: Walk Extraction Strategies for Node Embeddings with RDF2Vec in Knowledge Graphs, arxiv.org, 2020.
  154. Gilles Vandewiele, Bram Steenwinckel, Terencio Agozzino, Femke Ongenae: pyRDF2Vec: A Python Implementation and Extension of RDF2Vec. arxiv.org, 2022.
  155. Roderick van der Weerdt, Victor de Boer, Laura Daniele, Ronald Siebes, Frank van Harmelen: Evaluating the Effect of Semantic Enrichment on Entity Embeddings of IoT Knowledge Graphs. First International Workshop on Semantic Web on Constrained Things, 2023.
  156. Svitlana Vakulenko: Knowledge-based Conversational Search. TU Wien, 2019.
  157. Wytze J. Vlietstra, Rein Vos, Erik M. van Mulligen, Guido W. Jenster, an A. Kors: Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph. PLOS One, 2022
  158. Michael Matthias Voit, Heiko Paulheim: Bias in Knowledge Graphs - an Empirical Study with Movie Recommendation and Different Language Editions of DBpedia. Conference on Language, Data and Knowledge, 2021
  159. Yiwei Wang, Mark James Carman, Yuan Fang Li: Using knowledge graphs to explain entity co-occurrence in Twitter. ACM Conference on Knowledge and Information Management, 2017
  160. YueQun Wang, LiYan Dong, XiaoQuan Jiang, XinTao Ma, YongLi Li, Hao Zhang: KG2Vec: A node2vec-based vectorization model for knowledge graph. PLOS ONE, 2021
  161. Hongxiao Wang, Hao Zheng, Danny Z. Chen: TANGO: A GO-term Embedding Based Method for Protein Semantic Similarity Prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022
  162. Keyu Wang, Site Li, Jiaye Li, Guilin Qi, Qiu Ji: An Embedding-based Approach to Inconsistency-tolerant Reasoning with Inconsistent Ontologies. arxiv.org, 2023.
  163. Tobias Weller: Learning Latent Features using Stochastic Neural Networks on Graph Structured Data. KIT, 2021.
  164. Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto: Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020
  165. Peiran Yao and Denilson Barbosa: Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out. The Web Conference, 2021
  166. Shuo Zhang and Krisztian Balog: Ad Hoc Table Retrieval using Semantic Similarity. The Web Conference, 2018
  167. Shuo Zhang and Krisztian Balog: Semantic Table Retrieval using Keyword and Table Queries. arxiv.org, 2021
  168. Shuo Zhang, Xiaoli Lin, Xiaolong Zhang: Discovering DTI and DDI by Knowledge Graph with MHRW and Improved Neural Network. IEEE International Conference on Bioinformatics and Biomedicine, 2021
  169. Amal Zouaq and Felix Martel: What is the schema of your knowledge graph?: leveraging knowledge graph embeddings and clustering for expressive taxonomy learning. International Workshop on Semantic Big Data, 2020.

Acknowledgements

The original development of RDF2vec was funded in the project Mine@LOD by the Deutsche Forschungsgemeinschaft (DFG) under grant number PA 2373/1-1 from 2013 to 2018. Additional developments and extensive experiments have been performed by Jan Portisch, funded by SAP SE.

Contact

If you are aware of any implementations, extensions, pre-trained models, or applications of RDF2vec not listed on this Web page, please get in touch with Heiko Paulheim.