DevOps

Explaining graph databases to a developer

Tuesday, October 17, 2017

Google-Play-Store-Developers-Claim-Leaderboard

Graph Databases and everything a developer needs to know about how to get the most out of them.

Organizations are increasingly beginning to grasp onto the power of graph databases, which helps them unlock business value within connections, influences and relationship within their data. Graph databases enable new applications to adapt to changing business needs and existing applications to scale with the business.

To learn more about how organizations can implement and find value in graph databases, we spoke with Jim Webber, the Chief Scientist at graph technology pioneer Neo4j.

ADM: Can you explain graph databases at a high-level, as they compare to relational databases?

Webber: Traditional relational databases model data as a set of tables and columns and are best suited for data that is predictable and not strongly interrelated. Unlike relational databases, graph databases are adept at working with multiple connected points of information and exploring the relationships between them. Rather than tables and columns, graph databases use graph structures to store and represent data, allowing for the simple and fast retrieval of complex hierarchical connections that are difficult to model and expensive to query in relational systems.

If you were to draw a model of your data on a whiteboard, that is the same model that is stored in a graph database - there is no technical obfuscation of the domain view. By assembling the data into these connected structures, you can build sophisticated models that map closely to your problem domain.

There is a substantial increase in scalability and performance when going from relational databases to graph databases, and the advantage of utilizing them grows with the complexity of the query. Our users regularly feed back that their most challenging queries go from minutes to milliseconds – orders of magnitude faster.

ADM: What about compared to other types of NoSQL databases?

Webber: Fowler and Sadalage describe the other NoSQL databases as being “aggregate-oriented.” This means they’re good for simple store and retrieve of single (aggregate) values: single values, columns, or documents typically. When your store and retrieve patterns are symmetric, this can work pretty well - store a document, retrieve a document - but when you want to make sense of the discrete data items, when you want to join them, these models fall far short.

Outside of quite primitive query language constructs and indexing, most of the clever things in these databases are left to the user. There is no support for joins and therefore no support for querying a rich model.

What these databases typically do provide is large scale for simple data (with varying degrees of unsafety stemming from their consistency model), and for a good level of operational fault tolerance. But in terms of the model they support, they’re a step backwards from the relational model not a step forwards.

ADM: What are the benefits of native graph databases, versus non-native graph databases?

Webber: Organizations deploying non-native graph databases tend to do so because their operations teams are more familiar with a non-graph backend, like MongoDB or Cassandra. The disconnect between graph data with non-graph storage leads to a number of performance and scalability issues, especially as your data grows. Your team may face a learning curve when familiarizing themselves with a new backend or native graph query language, but the benefits of using a native graph database pay off in dividends in the long-term.

From the ground up, native graph databases are designed specifically for the storage and management of graphs and are the most efficient for querying graph data. In contrast, non-native graph databases are not optimized for the variably connected nature of graph data. Locality will be lost and expensive indexing used where better, cheaper approaches are used by native technology. Worse, the consistency semantics may be inappropriate for graphs and lead to corruption under normal operations.

Regardless of how much data you are currently working with, it’s important to plan for the future, as your data is likely to grow alongside your business. As you continue to accumulate data, many queries will slow in a non-native database and will require much more hardware for equivalent query performance.

ADM: What have been a few of the key milestones that propelled the graph database industry into the mainstream?

Webber: I think a key achievement in the public eye happened In 2016, when the International Consortium of Investigative Journalists (ICIJ) turned to Neo4j to unravel the Panama Papers.The Panama Papers was the largest data leak in history which ultimately connected numerous (high profile) individuals to tax havens. A graph database was used to make connections within the 2.6 terabytes of data, surfacing connections between powerful politicians, business owners, banks, and offshore businesses, which can be used to cover up tax evasion and other financial crimes. This propelled graph database technology into the public eye, demonstrating its ability to quickly identify relationships within mass amounts of data. Today, graph databases are no longer a niche market and are being adopted by many of the top companies across industries like retail, finance, healthcare, manufacturing and security.

Jim Webber, Chief
Scientist at Neo4j

ADM: What are some of the most popular use cases for graph databases?

Webber: Graph databases are used across nearly every industry, whether they’re processing data to get NASA to Mars two-years faster, unraveling the Panama Papers or helping retailers make personalized product recommendations. eBay’s ShopBot, for example, provides a personalized AI-driven shopping experience via Facebook Messenger, using Neo4j as the underlying technology. A Neo4j graph database is also being used by an international investment bank to manage identity and authentication, allowing its security team to focus on projects with a greater business impact. The bottom line is this: modern businesses bet on graphs.

ADM: What are the top considerations for a novice application developer when it comes to implementing a graph database?

Webber: Relational databases have been the powerhouse of software applications since the 1980s and continue to serve as the database standard to this day. To be successful with implementing a graph database, you have to be willing to step out of your comfort zone and challenge that norm. The freedom provided by graph databases can be scary, even bewildering at first, but once you’re over that curve and grasp the full potential of the technology, you’ll never go back to the implicit complexities of RDBMS or suffer the weak data models of NoSQL.

As a developer myself, I know that a developer’s work is closely tied to business value. Regardless of your experience as an application developer, incorporating elements of computer science into your work will help you multiply business value when working with data, and graph databases are a fundamental part of this initiative.

About Jim Webber

Jim Webber is Chief Scientist at Neo4j, working on next-generation solutions for massively scaling graph data. Prior to joining Neo4j, Jim was a Professional Services Director with ThoughtWorks where he worked on large-scale computing systems in finance and telecoms. Jim has a Ph.D. in Computing Science from the Newcastle University, UK.