What’s Different About RDF?

Learn how global identity and inference distinguish RDF from other graph models.

Two of the most notable features of the RDF graph model seen from a graph perspective are global identity and inference. These lead naturally to a certain facility in managing data integration and in building out knowledge graphs.

Let’s talk about each of these in turn.

Integrating data at scale

RDF uses web names, or URIs, for identifying the nodes and edges in its data model. What does that mean? It means everything.

The URI has been a key development in the ongoing integration of the global telecoms network. This builds on the DNSDomain Name System system for naming computers as nodes on the (inter)network. By extending this with a network protocol scheme and a local address on the host system, we are able to identify (and retrieve) documents using the web.

The same URI pattern that we use for documents can also be used to identify data points within an RDF graph. The thinking here is that descriptions of things (that is, documents) can be sent in place of the actual things themselves, which might not be so easy to transmit without Star Trek transporters to beam them down. So, in principle, information about any resource (be it physical or abstract) can be returned. We can build a global information network.

Namespacing

There’s also another benefit to using URIs—namespacing. Using a public namespacing, we get naming authorities, branding and trust, and guarantees of uniqueness. And it follows that, in effect, we have a commons for developing a shared semantics.

Note: Here are the definitions of URL, URI, and IRI.

URL (Uniform Resource Locator): This is an identifier for the network location of a web document based on the network protocol used to retrieve it.

URI (Uniform Resource Identifier): This is a generic identifier for any resource, regardless of whether the network protocol allows access to a physical resource

IRI (Internationalized Resource Identifier): This is an internationalized form of the URI

All this addresses sharing of data and of a semantics for that data, but it doesn’t directly talk about data integration. How that happens is that if user A makes statements about the subject S, and user B also makes statements about the subject S, then those sets of statements can be simply added together because the subject S is the same. And we know the subject S is the same because we are using a global name. What we effectively have with RDF are self-joining datasets based on the use of URIs, or global names.

Extracting knowledge from graphs

Graphs are an excellent choice for representing knowledge bases because they allow easy and arbitrary connections to be set up between the data items. This naturally leads to the notion of knowledge graphs.

But knowledge graphs are more than fixed data stores. They generally follow an open-world model, which allows new data to be added as required, and the shape of the data is not constrained as in a relational database.

In a sense, they are programmable knowledge stores. New data can be added from the outside, new data can be generated from the inside, and new interpretations over the data can be made. They are more akin to knowledge machines.

RDF builds on common standards for naming, which allows for different datasets to be readily mixed together. Formal reasoning systems from the knowledge representation communities have been layered on top of the basic RDF model. RDF datasets can then be modeled according to RDF schemas (or “ontologies” as they are sometimes called), which are also expressed in RDF. These RDF schemas are built on formal semantics and a system of logic. This means we can reason over the data, deduce logical inferences, and extract new facts or statements, which can be added to the dataset. We can therefore “grow” the dataset.

Get hands-on with 1200+ tech skills courses.