Friday, 15 August 2014

NoSQL Database Adoption Trends

NoSQL databases have been getting lot of attention over the last few years for their performance, scalability, schema flexibility and analytics capabilities. While relational databases are still good choice for certain use cases - like structured data and applications that require ACID transactions - NoSQL databases are better suited for use cases where:
  1. The data stored is semi-structured or unstructured in nature
  2. The applications that access this data require a certain level of performance and scalability
  3. The applications that access this data are ok with eventual consistency
Non-relational databases typically support the following capabilities:
  • Schema flexibility
  • Shared nothing architecture
  • Sharding as part of the data storage model
  • Asynchronous replication
  • BASE instead of ACID Transactions
InfoQ would like to learn what NoSQL databases you are currently using or planning on using in your applications.

Document Databases

  • MongoDB: MongoDB is an open-source document oriented database.
  • CouchDB: Apache CouchDB is a database that uses JSON for documents, JavaScript for MapReduce queries, and HTTP for an API.
  • Couchbase: NoSQL document database based on JSON model.
  • RavenDB: RavenDB is a document-oriented database based on .NET language.
  • MarkLogic: MarkLogic NoSQL database is used to store XML-based, document-centric information. It supports schema flexibility.
  • Other Document Database

Graph Databases 

  • Neo4j: Neo4j is a property graph database; supports ACID transactions.
  • InfiniteGraph: Graph database used to persist and traverse relationships between objects, supports distribute data stores.
  • AllegroGraph: AllegroGraph is a graph database that uses memory utilization in combination with disk-based storage for scalability, supports SPARQL, RDFS++, and Prolog reasoning.
  • Other Graph Database

Key Value Data Stores 

  • Riak: Riak is an open source, distributed key value database, supports data replication and fault-tolerance.
  • Redis: Redis is an open source key-value store. Supports master-slave replication, transactions, Pub/Sub, Lua scripting, Keys with a limited time-to-live.
  • Dynamo: Dynamo is a key-value distributed data store. It is directly implemented as Amazon DynamoDB; used in Amazon S3 product.
  • Oracle NoSQL Database: Key-value NoSQL database from Oracle. It supports ACID transactions and JSON.
  • Voldemort: Distributed key-value storage system with the data replication and partitioning.
  • Aerospike: Aerospike database is a key-value store; supports hybrid memory architecture and data integrity with strong or tunable consistency.
  • Other Key Value Data Store

Columnar Databases 

  • Cassandra: Cassandra is column database that supports data replication across multiple data centers. Its data model offers column indexes, log-structured updates, support for denormalization, materialized views, and built-in caching.
  • HBase: Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. It provides Bigtable-like capabilities on top of Hadoop and HDFS.
  • Amazon SimpleDB: Amazon SimpleDB is a non-relational data store that offloads the work of database administration. Developers store and query data items using web services requests.
  • Apache Accumulo: Apache Accumulo sorted, distributed key/value data store created based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift technologies.
  • Hypertable: Hypertable is an open source, scalable database, also modeled after Bigtable; supports sharding.
  • Azure Tables: Windows Azure Table Storage Service offers NoSQL capabilities for applications that require storage of large amounts of unstructured data. Tables can auto-scale to store up to several terabytes of data. They are accessible via REST and managed APIs.
  • Other Columnar Database

In-Memory Data Grids 

  • Hazelcast: Hazelcast CE is an open source data distribution platform. It allows the developers to share and partition the data across the database cluster.
  • Oracle Coherence: Oracle's in-memory data grid solution that provides fast access to frequently used data. Coherence supports event capabilities and dynamic partitioning of data.
  • Terracotta BigMemory: Distributed in-memory management solution from Terracotta. The product includes an Ehcache interface, Terracotta Management Console and BigMemory-Hadoop Connector (early access).
  • GemFire: VMware vFabric GemFire is a distributed data management platform and provides elastic in-memory data management, replication, partitioning, data-aware routing, and continuous querying.
  • Infinispan: Infinispan is a Java based open source key/value NoSQL datastore and distributed data grid platform. It supports transactions and peer-to-peer as well as client/server architecture.
  • GridGain: Distributed, object-based, in-memory, SQL+NoSQL key-value database. Supports ACID transactions.
  • GigaSpaces: GigaSpaces in-memory data grid (the Space) serves as the system of record for the applications and supports a variety of caching scenarios.
  • Tibco: ActiveSpaces product from Tibco provides an infrastructure to create virtual data caches from the aggregate memory of participating nodes in the cluster and to scale as nodes join and leave. 
  • Other In-Memory Data Grid
Please rank the following NoSQL database products, based on YOUR experience working with them or researching them for potential use in production environments, based on the following criteria:
  • Value Proposition: The value these databases can potentially bring to the business.
  • Adoption Readiness: How ready they are to be used in the real world applications.
You can drag and drop the option on the radar and click “Submit Now!” when you are done.


less
 
Value Proposition
 
 
more
Drag  
 MongoDB
 Couchbase
 CouchDB
 RavenDB
 Neo4j
 InfiniteGraph
 AllegroGraph
 Riak
 Redis
 Dynamo
 Voldemort
 Aerospike
 Cassandra
 HBase
 Amazon SimpleDB
 Apache Accumulo
 Hypertable
 Azure Tables
 Hazelcast
 GemFire
 Infinispan
 Oracle Coherence
 Terracotta BigMemory
 Couchbase
 Oracle NoSQL Database
 GridGain
 GigaSpaces
 Tibco
 MarkLogic
 

Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency Storage

Speaking at QCon New York on Wednesday Jeff Johnson, from the core data group at Facebook, announced Apollo, Facebook’s Paxos-like NoSQL database. Written in C++11 on top of the Apache Thrift 2 RPC framework, Apollo is a hierarchical storage system where all the data is split into shards, very much analogous to region servers in HBase. The sweet-spot for it, Johnson explained, is on-line low latency storage - in particular Flash and in-memory.
As distinct from a document oriented, or key value store, Apollo is about modifications to data structures, allowing you to represent maps, queues, trees and so on, as well as key values. Within the system individual pieces of data are quite small - a range of between 1 byte and 1MB, with a total size anywhere from 1MB to 10+PB. It supports anything from a minimum of three servers to thousands.
Each Shard has four components. The first is a quorum consensus protocol which is based onRaft, a strong leader consensus protocol from Stanford. Johnson explained that one of the things that his team really like about Raft is that the leader failure recovery is really well defined, as is the quorum view change. That said, he suggested, it isn’t really simpler to work with than multi-paxos:
We’ve had to do tons and tons and stuff - everything from allowing you to asynchronous write and read from disk to trying to deal with situations when followers are getting behind because there’s other stuff going on on the server or the disk is slow, corruption detection, and so on
The second component is storage. At the time of writing the primary storage is based RocksDB, a Key/Value store that builds on top of Google’s LevelDB. Whilst it is a Key/Value store Facebook are using it to emulate other data structures. Apollo is designed to be storage agnostic and the team are also working on adding support for MySQL as an alternative storage engine.
The third component is a Client API with read() and write() methods. Every operation that Apollo performs at a Shard level is atomic, so you express pre-conditions and if these are satisfied it returns the reads or writes. For example this code:
read(conditions : {map(m1).contains(x)}, 
     reads : {deque(d2).back()})
Says “If the map m1 contains the value x then return the value that is on the back of the d2 deque.” 
You can combine together any number of conditions and any number of reads.
Writes are very similar and again allow you to express conditions:
write(conditions : {ver(k1) == v}, reads : {}, 
      writes : {val(k1) := x})
The final of the four shard components are Fault Tolerant State Machines (FTSMs). These are primarily used by the system code but can also be used for user code. Each FTSM is owned by a shard so that, for example, in a shard of three machines all of them will be executing the same code at the same time. They are able to access the persistent storage that is local to each machine. Most importantly if one node dies the code continues to execute in a proper order that all the nodes agree on.
Amongst other things the state machines are used for load balancing, data migration, shard creation and destruction, and co-ordinating cross-shard transactions. State machines can have external side effects, for example they can send RPC requests to remote machines, but whenever they make a change to persistent state they have to submit it to Raft to get all the servers to agree.
Apollo isn't currently being used in production at Facebook, but the firm is looking at using it to replace some memcahced use cases, and Johnson made clear that Facebook makes very significant use of memcahced.  "More generally," Johnson told InfoQ "we’re looking into various in-memory storage use cases at Facebook, either new ones or replacing some existing ones, by comparing side-by-side with existing systems."  
The company is also looking at using Apollo as a reliable queuing system for outgoing Facebook messages to iOS, Android and carriers via SMS, and also potentially for faster analytics.
Apollo is still in development and hasn’t been open-sourced though Johnson did state that doing so was something Facebook were looking into and would like to do. Johnson's presentation is currently available to QCon New York attendees and will be published to everyone via InfoQ in due course.
Related Posts Plugin for WordPress, Blogger...