Engage with Graphs

Learn why graphs are important and the open world assumption they follow, and visualize the concepts we'll touch upon.

Graphs are those simple data abstractions that lie behind networks of all stripes. The graph provides the basic connection matrix, while the network maps the graph to a given subject domain and uses the wiring plan for its specific business goals. Networks are found wherever distributed processes and systems occur and are used for all kinds of purposes. Indeed, graphs are just everywhere.

Why graphs?

Graphs and graph databases are turning up just about everywhere, which is no big surprise given that graphs are able to address the two main concerns we have in dealing with the huge volumes of data all around us—organization and scale. Different from the more usual “buckets of data” or relational data approaches, graphs can bring both order and growth to data as data goes large. And they can do this in an organic and holistic manner. This is what makes graphs such a fascinating field to work with.

As an organizational pattern, graphs operate at all levels, from the smallest static structures, such as chemical compounds—think of a water molecule—to large-scale dynamic structures, such as web-based social networks—think Facebook or Twitter.

Graphs are especially useful in dealing with messy and irregular datasets and hard-to-fit data. They cope particularly well with sparse datasets. Unlike the relational model, with fixed tables optimized for transactional database requests, graphs tend to turn things on their head. Instead of dealing with objects as sets of relations and then attempting joins over these sets, it’s the relationships between objects that become the chief organizing principle. It’s all about the connections rather than the records. Schemas take a backseat—still incredibly useful but not overly restrictive. We have a much more fluid way of relating our data items.

Open world assumption vs. closed world assumption

With graphs, we are typically working with an open worldOpen_world_assumption assumption and therefore with partial knowledge. We can’t conclude anything definite from the missing data. Any missing data may arrive at any future time. This is in contrast to more familiar data models which commonly use a closed worldClosed_world_assumption assumption where everything is known ahead of time and locked down. Those data models are predictable and provide solid guarantees about data integrity. The downside is that they are regimented.

The open world assumption assumes incomplete knowledge as opposed to the closed world assumption
The open world assumption assumes incomplete knowledge as opposed to the closed world assumption

In a sense, these two approaches to managing data—open world versus closed world—echo the developments we have seen in modern science, where an earlier type of classical physics dealt with facts and certainties, but a newer quantum physics introduced uncertainties. Both types of physics are still valid but apply in their own separate regimes. One tends to a mechanistic description, the other to an organic description.

In short, graphs are very good at gluing pieces of data together.

Now, while graph data structures can connect data items, in practice, graphs themselves tend to be disconnected from each other both physically in separate graph databases and conceptually in terms of the data model. To move data between graphs, it’ll help to understand better their respective data models and to see how we can transform from one graph model to another.

We’ll get some experience in later lessons with working with the different graph models as well as specifically taking a look at graph transformations. The graph-to-graph problem is almost as challenging as the structured data (table or document) to graph problem.

Concept map

The following concept map (and yes, it’s a graph) can assist in indicating some of the things we’ll touch upon.

Concept map

We’ll deal with graphs as structures for organizing data at large. We’ll see how we can use Elixir to process graphs—both databases and distributed. We’ll be looking especially at so-called “semantic” graphs—that is, graphs with an information-bearing capacity. We’ll need to consider different graph models and what they provide and how they can be related, and we’ll need to work with different query languages.

So let’s first check our understanding of what a graph is and also see some common paradigms for graph models. We’re going to work with some different graph packages in Elixir, but first, we can try our hand at building a graph with a library that ships with Elixir. To compare the different graph packages with their respective graph models, we could do with a reference graph model.