Friday 15 August 2014

Data Modeling in Graph Databases: Interview with Jim Webber and Ian Robinson

Graph databases are NoSQL database systems that use graph data model for storage and processing of data.
Matt Aslett from the 451 group notes that graphs are now emerging from the general NOSQL umbrella as a category in their own right. In the last 12-18 months there has been a shift in the graph space with growth in the category for all things graph.
Data modeling effort when using a Graph database follows a different paradigm than how you usually model the data stored in Relational or other NoSQL databases like Document databases, Key Value data stores, or Column Family databases. Graph data models can be used to create rich and highly connected data to represent the real world use cases and applications.
InfoQ spoke with Jim Webber and Ian Robinson in Neo Technologies team (also co-authors ofGraph Databases book) about the data modeling efforts and best practices when using Graph databases for data management and analytics.
InfoQ: What type of data is not suitable for storing in a Relational Database but is a good candidate to store in a Graph Database?
Jim & Ian: That’s pretty straightforward to answer: anything that’s interconnected either immediately, because the coding and schema design is complicated, or eventually, because of the join bomb problem [http://blog.neo4j.org/2013/01/demining-join-bomb-with-graph-queries.html] inherent in any practical application of the relational model.
Relational databases are fine things, even for large data sets, up to the point where you have to join. And in every relational database use case that we’ve seen, there’s always a join — and in extreme cases, when an ORM has written and hidden particularly poor SQL, many indiscriminate joins.
The problem with a join is that you never know what intermediate set will be produced, meaning you never quite know the memory use or latency of a query with a join. Multiplying that out with several joins means you have enormous potential for queries to run slowly while consuming lots of (scarce) resources.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...