As data stores mushroomed, searching for an easy and effective way to find the information you want has been the holy grail. We have always had a desire to “connect the dots” across systems and disparate data sources. After 9/11, the search for ways to connect data to prevent terrorist attacks intensified, as we all realized the automation we had created by then was overwhelming. Since then, anxious people in many disciplines have frequently intoned the plea that “we need a Google for our discipline.” They meant by that quest to search all the data they had access to in a fast and efficient manner regardless of the topic of the search using a natural language query. But this magical capability was not widely replicated until 2012 when Google informed the public that it used a version of knowledge graph technology.
Knowledge graphs are a realization of the concept of linked data that Tim Berners Lee called for in a Ted talk, asking everyone to create links between data components available through the World Wide Web. He put the burden on government to lead the way in this pursuit, and some forward-thinking agencies began to explore this idea based on the use of artificial intelligence coupled with search theory. But the progress toward the ultimate capability to find data using these technologies was slow due to the lack of standards, the immaturity of software offerings, and the lack of a theoretical basis for this novel approach to accessing large data stores.
Now, however, knowledge graphs have come of age. With companies offering packaged applications of graph databases with knowledge graphs, the dots are being connected in use case after use case. Knowledge graphs and the implementing applications provide ways to automatically ingest data from a large number of data sources, automatically create an ontology as the basis of a data fabric showing how the data components are related (linked), create a standards-based query throughout the assembled data fabric that is constructed without programming and find the data nearly immediately. Knowledge graphs created through this technology can search large data stores in seconds rather than the hours or days required in conventional relational databases. Knowledge graphs are particularly useful in identity resolution and creating 360-degree data collections about people, places, or events.
But the most critical aspects of knowledge graphs are that: (1) having built a graph database, it is possible to quickly ask the questions that were unknown or unknowable when the system was initially created, and (2) the nature of the underlying data (both structured and unstructured) does not limit the capability of knowledge graphs. Data stores in a conventional relational database and textual documents or images can all be searched to compile the complete knowledge about a person, place, or event regardless of the form of the underlying data. Yes, we can connect the dots.
Knowledge graphs support creating a data fabric with the explicit knowledge inherent in an enterprise. Queries can then be formulated in a context unique to the enterprise. This kind of formulation of metadata used in seeking information has eluded the conventional relational database searching capabilities. By creating such a data fabric, knowledge graphs also provide a solid capability to resolve data governance issues within an enterprise that has long been a challenge in organizations when integrating many disparate data systems. An underlying premise in constructing a data fabric is that data is distributed, which is the reality in almost every enterprise. Data fabrics and knowledge graphs make this fact an opportunity rather than an impediment to connecting the dots.
As this technology has matured, it is based on standards that further enhance the power of knowledge graphs. The World Wide Web Consortium (W3C) has pioneered the development of the Resource Description Framework (RDF) and a query language (SPARQL) just for this technology. With such standards as a firm basis for building tools and systems that interoperate, the promise of connecting data through links represented by definitive standards makes interoperability possible and inevitable.
Reference vocabularies like the National Information Exchange Model (NIEM), with its semantic and syntactic elements, are highly compatible with the need to build an ontology quickly and incorporate both data aggregation and explicit knowledge into enterprise data fabrics.
Today, multiple innovative companies have developed tools to automate building knowledge graphs and data fabrics and solve enterprise-level problems and incorporate advanced analytics. The widespread energy in developing applications means that new capabilities will emerge quickly, and even more
power will be provided to the users. For a more extensive review of the state of the art, see The Rise of the Knowledge Graph paper.
Knowledge graph technology is not likely to replace the transactional and administrative systems that collect data on events and operations. Building a data fabric with knowledge graphs is best perceived as an overlay on a multitude of heterogeneous data sources. Yes, every enterprise can develop its own version of the Google search functionality, and we can now connect the dots.
Paul Wormeli
April 2021
While data elements can have great value, and the more relevant data elements that are available the more useful their application value, the aggregation of data elements needs to consider:
. The context within which the data element was collected
. The age of the data element: the older it is, the less its value
. The metric by which the data element was defined, all fruit are not oranges
. There is a point where the volume of data elements can become noise, difficult to identify useful patterns.
Knowledge graph technology is a useful step in creating value from noise