Kalyan Hadoop Training in Hyderabad @ ORIEN IT, Ameerpet, 040 65142345 , 9703202345

Friday, 15 August 2014

Transitioning from RDBMS to NoSQL. Interview with Couchbase’s Dipti Borkar

While relational databases have been used for decades to store data, and they still represent a viable solution for many use cases, NoSQL is being chosen today especially for scalability and performance reasons. This article contains an interview with Dipti Borkar, Director of Product Management at Couchbase, on the challenges, benefits and the process of migrating from RDBMS to NoSQL.

InfoQ: When is it the time to dump SQL for a NoSQL solution?

Dipti Borkar: OK, that title sounds a little harsh – and in truth, in most cases, it's not a matter of dumping SQL for a NoSQL solution, but rather, it’s about making a transition from one to the other, where application and use case dictate the need for a change. In general, this transition will be spurred by the need for flexibility – both in the scaling model and the data model - when building modern web and mobile applications.

Typical web applications are built with a three-tier architecture. To scale out the application, more commodity web servers are simply added behind a load balancer to support more users. The ability to scale out is a core tenet of the increasingly important cloud-computing model, in which virtual machine instances can be easily added or removed to match demand.

However, when it comes to the data layer, relational database (RDBMS) technology does not scale out and does not provide a flexible data model, presenting a number of challenges. Handling more users means adding a bigger server, and big servers are highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware in web- and cloud-based architectures. So, as organizations start seeing performance problems with their relational database for existing or new applications, particularly as the number of users grows, they realize the need for a faster, elastic database tier. This is the time to start evaluating NoSQL and adopting it as the database tier in their interactive web applications.

InfoQ: What would be the main steps required to transition from SQL to NoSQL?

Dipti Borkar: Organizations/projects can vary greatly in terms of what they are looking for in a NoSQL database; so much of the transition will depend on your use-case. Below are general guidelines for transitioning:

#1 Understand the key requirements for your application:

Some of the requirements that match the need for NoSQL are

Rapid application development
– Changing market needs
– Changing data needs
Scalability
– Unknown user demand
– Need for constantly growing throughput to access, add and update data
Consistent performance
– Low response time for better user experience
– High throughput to handle viral growth
Operational reliability
– High-availability to handle failures gracefully with minimal impact to the application
– Built-in monitoring APIs for easy ongoing maintenance

#2 Understand the various types of NoSQL offerings:

There is a common myth that all NoSQL databases are created equally - this is not true. Cassandra, for instance may be a solution you use for analytical use cases given its columnar data model. Neo4j, a graph database, for example, may be the database you use for applications that need access to relationships between entities.

I'll focus specifically on distributed document-oriented NoSQL database technology – with Couchbase and MongoDB being the two most visible and widely adopted examples.

#3 Execute a proof of concept

Once you have narrowed down on potential choices for the database tier, plan a proof of concept integrating the key characteristics of your application. Look for response time and throughput performance and the ability to scale out easily.

#4 Document modeling and development

For document databases, spend sometime on modeling your data from fixed tabular schemas to flexible document objects.

#5 Deploying to staging and production

Operational stability is a very important aspect for interactive web applications. Test and stage your application rollout as you would for applications that use traditional RDBMS systems. Ensure your selected database supports monitoring across the cluster, easy online scaling for adding capacity if needed and other database administrative tools.

#6 Stay up to date on newest trends

There is a plethora of quality, free training courses throughout the US that offer hands-on NoSQL training courses. The best way to ensure a successful NoSQL implementation is to have an educated developer team that is up to date on the latest server releases and vendor offerings.

Below are links to some of the biggest ones:

-CouchConf

-NoSQL Now

InfoQ: What are the main difficulties migrating from SQL to NoSQL?

Dipti Borkar: The main difficulty basically boils down to understanding the differences between the traditional RDBMS systems and document databases. The most important difference is the data model:

As shown above, each record in a relational database conforms to a schema – with a fixed number of fields (columns) each having a specified purpose and data type. Every record is the same. Data is denormalized across multiple tables. The upside is that there is less duplicated data in the database. The downside is that a change in the schema means performing several expensive “alter table” statements that requires locking down many tables simultaneously to ensure a change doesn’t leave the database in an inconsistent state.

With document databases on the other hand, each document can have a completely different structure from other documents. No additional management is required on the database to handle changes to document schemas.

InfoQ: What are the benefits of NoSQL document databases?

Dipti Borkar: The main benefits of document databases are:

Flexible data model
Data can be inserted without a defined schema, and the format of the data being inserted can change at any time—providing extreme flexibility in the application, which ultimately delivers substantial business agility.
Easy scalability
Some NoSQL databases automatically spreads data across servers, requiring no participation from the applications. Servers can be added and removed from the data layer without application downtime, with data and I/O spread across servers.
Consistent, high performance
Advanced NoSQL database technologies transparently cache data in system memory—a behavior that is completely transparent to the developer and the operations team.

InfoQ: How do developers react when you tell them about adopting NoSQL?

Dipti Borkar: Developers are extremely excited about NoSQL technologies particularly because of the ease of development some databases bring. Document databases have extremely flexible schemas and are easy to work with.

Developers can iterate over application changes faster without the need to change the schema of the underlying database. This is particularly useful when developers are building applications with sparse data or data that’s constantly changing or data from third-party providers they do not have control over.

InfoQ: Is it OK to work with existing developers and have them learn new skills or should you look for new ones that master NoSQL?

Dipti Borkar: Application developers will find it easy to adopt some NoSQL technologies, particularly those that support JSON as the document format. More and more developers are using JSON to model objects in their applications. Therefore storing the data directly as JSON in the database reduces the impedance mismatch across the stack.

Developers who heavily use SQL may need to adapt and learn about document modeling approaches. Rethinking how data can be structured in a logical way using documents rather than normalizing the data into a fixed database schema becomes an important aspect.

InfoQ: Have you had or heard of unsuccessful attempts to switch to NoSQL? If yes, what went wrong?

Dipti Borkar: Architects and developers should ensure that their key requirements are satisfied by the solution or database selected. For example, choosing a database that’s more suited towards analytical applications may not satisfy your latency and throughput needs for interactive applications. Projects that make a quick choice without investigating all requirements may find that they have slower response times for data access leading to a poor user experience. Users need to plan up front for scalability. Here’s a more drastic example of things going south. In some situations an app has gone viral but the database that was selected couldn’t keep up and scale out.

At the same time, using a database that is more suited towards an OLTP-like use case may not perform well for advanced analysis jobs or complex processing. A big data solution may be more suitable.

InfoQ: What are the key lessons migrating to NoSQL?

Dipti Borkar: There are a lot of benefits developers will see when moving to NoSQL. A more flexible data model and freedom from rigid schemas is a big one. You may also see significantly improved performance and the ability to horizontally scale out the data layer.

But most NoSQL products are in early stages of the product cycle. While functionality like complex joins or multi-document transactions can be simulated in the app, developers may be better off using a traditional RDBMS. And for some projects, a hybrid approach might be the best choice.

About the Interviewee

Dipti Borkar is the Director of Product Management at Couchbase where she is responsible for the product roadmap of Couchbase Server, a NoSQL database and works with customers and users to understand emerging requirements for low-latency, scalable data stores. Dipti has deep technical experience in the database industry having worked at IBM as a software engineer and Development Manager for the DB2 server team and then at MarkLogic as a Senior Product Manager. Dipti holds a Masters degree in Computer Science from the University of California, San Diego with a specialization in databases and holds an MBA from the Haas School of Business at University of California, Berkeley.

What is CouchDB and Why Should I Care?

CouchDB is one of what many are calling NoSQL solutions. Specifically, CouchDB is a document-oriented database and within each document fields are stored as key-value maps. Fields can be either a simple key/value pair, list, or map.

Each document that is stored in the database is given a document-level unique identifier (_id) as well as a revision (_rev) number for each change that is made and saved to the database.

NoSQL databases represent a shift away from traditional relational databases and can offer many benefits (and their own challenges) as well. CouchDB offers us these features:

Easy replication of a database across multiple server instances
Fast indexing and retrieval
REST-like interface for document insertion, updates, retrieval and deletion
JSON-based document format (easily translatable across different languages)
Multiple libraries for your language of choice (show some of the popular language choices)
Subscribable data updates on the_changes feed

An excellent tool to decide which data-store is right for you can be found in theVisual Guide To NoSQL Systems. This guide describes the three area of concerns that you can use to pick a database (be it NoSQL or relational in nature). For our project we used the guide to hunt for a database with the following features:

Availability
Consistency
Partition Tolerance

CouchDB fell into the AP camp (Availability and Partition Tolerance), which was what we were looking for for our own data concerns (not to mention the ability to replicate data on either a continuous or ad-hoc basis). As a comparison, MongoDB falls into the CP camp (Consistency and Partition Tolerance) and some databases, like Neo4J, offer a unique graph-oriented structure.

Another great tool to use is this blog post which compares Cassandra, MongoDB, CouchDB, Redis, Riak, HBase, and Membase.

It is highly conceivable that you may have more than one tool for a given project - in other words, you need to determine your needs and find the right tool to fit those needs.

How are we going to use CouchDB?

We are going to build a simple local events database to store some events as well as the venues at which they’ll take place. We will be splitting this up into two documents and wiring them together using their document ids. These two documents are:

Event
Place

(We will get into creating the Java classes for these two documents a bit later in this article.)

Jcouchdb

We are going to use jcouchdb to interface with our CouchDB database. This is an extremely well-tested and easy to use Java library that will automatically serialize and deserialize Java objects into the CouchDB database. Another reason why we chose jcouchdb is because of how close it is to the actual API of CouchDB itself.

What alternatives to jcouchdb are there?

If you find that you don’t like jcouchdb or would like to try a different library, there are quite a few to choose from:

A few of these haven’t been updated in quite awhile, so be sure to plan some time for programming spikes if you need to do some testing.

Getting Started

Where to get started? We are going to be using Maven 3 to build our sample project. You won’t need to know Maven in order to understand the code, but you will need to have it installed in order to build and run the sample project. You can find Maven 3 on the Maven website.

For this part of the tutorial we will be assuming some level of Maven 3 knowledge, or if you don’t know Maven you can just download the pom.xml file directly from our repository and use it directly.

We’re going to skip the initial part of POM creation, but you can download it from our github repository at (https://github.com/r351574nc3/spring-couch-intro/blob/master/pom.xml) if you need the nitty gritty details of creating a pom or just want to get started coding. First order of business is to specify the jcouchdb and Spring components we will be needing.

<properties>
    <spring.framework.version>3.1.0.RELEASE</spring.framework.version>
    <spring-xml.version>2.0.0.RELEASE</spring-xml.version>
    <jcouchdb.version>0.11.0-1</jcouchdb.version>
...
</properties>

One reason to specify the version information at the top of the file is that it makes it easy to quickly update to a new version of a given library (or suite of libraries like Spring) all at once.

<dependencies>
    <dependency>
        <groupId>com.google.code.jcouchdb</groupId>
        <artifactId>jcouchdb</artifactId>
        <version>${jcouchdb.version}</version>
    </dependency>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-context</artifactId>
        <version>${spring.framework.version}</version>
    </dependency>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-aop</artifactId>
        <version>${spring.framework.version}</version>
    </dependency>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-test</artifactId>
        <version>${spring.framework.version}</version>
    </dependency>
    <dependency>
        <groupId>org.springframework.ws</groupId>
        <artifactId>spring-xml</artifactId>
        <version>${spring-xml.version}</version>
    </dependency>
    ...
</dependencies>

Now that we have our initial dependencies setup, we will need to setup the rest of our directory structure for our project. We are going to stick with the standard Maven setup:

-src
    -main
        -java
        -resources
        -webapp
    -test
        -java
        -resources

Getting CouchDB setup

Now that we have our initial setup complete, we now need to setup our CouchDB database. Thankfully, there are some great solutions for getting up and running quickly.

Both of these offer free accounts and are perfect for getting our database setup so we can start developing.

(Click on the image to enlarge it)

Fig. 1 - CouchAnt home screen

(Click on the image to enlarge it)

Fig. 2 - CouchAnt’s Futon screen

(Click on the image to enlarge it)

Fig. 3 - The signup screen for Iris Couch

(Click on the image to enlarge it)

Fig. 4 - Iris Couch’s Futon screen

The other option we have is to install CouchDB on a local machine (or host). We won’t walk you through installing on your specific operating system but there are some excellent instructions onCouchDB’s wiki.

Once you have your account created (or CouchDB up and running), we will need to create a database to play with. For our application we chose couchspring as the database name. Feel free to choose your own but you’ll need to change it when we begin to configure our setup.

To create a database in CloudAnt you can do this from their databases screen (Fig. 1), for Iris Couch you can do this directly in Futon (the user interface for managing your CouchDB instance). More information on Futon can be found on the CouchDB wiki. We won’t be doing much using Futon in this article but it is a great tool for playing around with views.

Fig. 5 - Create database in Futon Step 1

Fig. 6 - Create database in Futon Step 2

Configuring jcouchdb, Spring and our POJOs

Now that we have a new database setup we need to:

Create our base POJO objects
Provide a json configuration mapping, which will automatically convert between the Java objects and the JSON objects that CouchDB uses
Spring configuration

First, let’s create some objects!

POJOs with some custom annotations

What are the base objects we’ll need to create, then, for our event system?

Event - to store events either from outside sources (like Eventful.com) or using a web interface
Place - to store venues where events are being held at

We have a few other objects that will be used in conjunction (and do some additional data processing while pulling in data from external sources):

AppDocument - base object used by the json mapping utility to define a document-type differentiator field
Description - used for formatting and filtering out the event’s description
Location - used to record the latitude and longitude of a given place/venue

First things first, we need to create our base class AppDocument

AppDocument.java

package com.clearboxmedia.couchspring.domain;

import org.jcouchdb.document.BaseDocument;
import org.svenson.JSONProperty;

public class AppDocument extends BaseDocument {
    /**
     * Returns the simple name of the class as doc type.
     * 
     * The annotation makes it a read-only property and also shortens the JSON name a little.
     *
     * @return document type name
     */
    @JSONProperty(value = "docType", readOnly = true)
    public String getDocumentType()
    {
        return this.getClass().getSimpleName();
    }

}

This object extends from jcouchdb’s own BaseDocument object and provides a way to differentiate between different document types. CouchDB doesn’t have a default way to handle this and leaves it up to you the developer to implement on your own. We’ve chosen to use the class name as our differentiator; for example, Event objects will output docType as Event andPlace objects will output Place.

Next we need to create our Event class.

Event.java (we have abbreviated some of the fields and methods for brevity)

package com.clearboxmedia.couchspring.domain;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlElement;

import org.svenson.JSONProperty;

public class Event extends AppDocument {
    private String id;
    private String title;
    private String description;
    private String startTime;
    private String stopTime;
    private String venueId;
    private Map<String,String> links;
    private String lastUpdated;
    private String externalUpdatedDate;
    private List<String> tags;

    public Event() {
    }

    public String getId() {
        return this.id;
    }

    public void setId(final String id) {
        this.id = id;
    }

    public String getTitle() {
        return this.title;
    }

    public void setTitle(final String title) {
        this.title = title;
    }

    public String getLastUpdated() {
        return this.lastUpdated;
    }

    public void setLastUpdated(final String lastUpdated) {
        this.lastUpdated = lastUpdated;
    }

    public String getExternalUpdatedDate() {
        return this.externalUpdatedDate;
    }

    public void setExternalUpdatedDate(final String externalUpdatedDate) {
        this.externalUpdatedDate = externalUpdatedDate;
    }

    @JSONProperty(ignore = true)
    public Place getVenue() {
        //dynamically lookup venue from the database and return it
    }

    public String getVenueId() {
        return this.venueId;
    }

    public void setVenueId(final String venueId) {
        this.venueId = venueId;
    }
    public String getDescription() {
        return this.description;
    }

    public void setDescription(final String description) {
        this.description = description;
    }

    public String getStartTime() {
        return this.startTime;
    }

    public void setStartTime(final String startTime) {
        this.startTime = startTime;
    }


    public String getStopTime() {
        return this.stopTime;
    }

    public void setStopTime(final String stopTime) {
        this.stopTime = stopTime;
    }

    public List<String> getTags() {
        return this.tags;
    }

    public void setTags(final List<String> tags) {
        this.tags = tags;
    }

    public Map<String,String> getLinks() {
        return this.links;
    }

    public void setLinks(final Map<String,String> links) {
        this.links = links;
    }
}

There are a few things of interest going on here. First is the fact that we’re storing the venueIdinstead of the venue in our object, why do we do this?

Because CouchDB isn’t a relational database, there isn’t a direct way to define a relationship between two different documents so we store the id of the venue in the Event object. We could store the venue object embedded in our event object, but it makes more sense to store these separately, especially since you could have multiple events at a given venue. So, instead of storing the relationship, we will provide a dynamic getter that will retrieve the venue object only when we need it. We’ll describe how to do this in the Querying for documents section. [todo: dynamic query]

Now, we need to define our Place class.

Place.java

package com.clearboxmedia.couchspring.domain;
import java.util.LinkedHashMap;
import java.util.List;

public class Place extends AppDocument {
    private String id;
    private String name;
    private String address1;
    private String address2;
    private String city;
    private String state;
    private String postalCode;
    private String lastUpdated;
    private Boolean active;
    private Location location;
    private String venueType;
    private List<String> tags;

    public Place() {
    }

    public String getId() {
        return this.id;
    }

    public void setId(final String id) {
        this.id = id;
    }

    public String getName() {
        return this.name;
    }

    public void setName(final String name) {
        this.name = name;
    }

    public String getAddress1() {
        return this.address1;
    }

    public void setAddress1(final String address1) {
        this.address1 = address1;
    }

    public String getAddress2() {
        return this.address2;
    }

    public void setAddress2(final String address2) {
        this.address2 = address2;
    }

    public String getCity() {
        return this.city;
    }

    public void setCity(final String city) {
        this.city = city;
    }

    public String getState() {
        return this.state;
    }

    public void setState(final String state) {
        this.state = state;
    }

    public Location getLocation() {
        return this.location;
    }

    public void setLocation(final Location location) {
        this.location = location;
    }

    public String getVenueType() {
        return this.venueType;
    }

    public void setVenueType(final String venueType) {
        this.venueType = venueType;
    }

    public String getPostalCode() {
        return this.postalCode;
    }

    public void setPostalCode(final String postalCode) {
        this.postalCode = postalCode;
    }

    public String getLastUpdated() {
        return this.lastUpdated;
    }

    public void setLastUpdated(final String lastUpdated) {
        this.lastUpdated = lastUpdated;
    }

    public Boolean getActive() {
        return this.active;
    }

    public void setActive(final Boolean active) {
        this.active = active;
    }

    public List<String> getTags() {
        return this.tags;
    }

    public void setTags(final List<String> tags) {
        this.tags = tags;
    }
}

We won’t detail the other helper objects Description or Location, as they are fairly simple. If you’re interested, you can check them out from the GitHub repository.

Configuring jcouchdb and the JsonConfigFactory

Before we configure, we need to create a few classes we’ll be using. JsonConfigFactory for mapping between the json data (CouchDB) and the Java classes, and CouchDbServerFactoryfor creating a new instance of our server we will be connecting to.

JsonConfigFactory.java public class JsonConfigFactory {

    /**
     * Factory method for creating a {@link JSONConfig}
     *
     * @return {@link JSONConfig} to create
     */
    JSONConfig createJsonConfig() {
        final DateConverter dateConverter = new DateConverter();
    
        final DefaultTypeConverterRepository typeConverterRepository = new DefaultTypeConverterRepository();
        typeConverterRepository.addTypeConverter(dateConverter);
        // typeConverterRepository.addTypeConverter(new LatLongConverter());

        // we use the new sub type matcher  
        final ClassNameBasedTypeMapper typeMapper = new ClassNameBasedTypeMapper();
        typeMapper.setBasePackage(AppDocument.class.getPackage().getName());
        // we only want to have AppDocument instances
        typeMapper.setEnforcedBaseType(AppDocument.class);
        // we use the docType property of the AppDocument 
        typeMapper.setDiscriminatorField("docType");        
        // we only want to do the expensive look ahead if we're being told to
        // deliver AppDocument instances.        
        typeMapper.setPathMatcher(new SubtypeMatcher(AppDocument.class));

        final JSON generator = new JSON();
        generator.setIgnoredProperties(Arrays.asList("metaClass"));
        generator.setTypeConverterRepository(typeConverterRepository);
        generator.registerTypeConversion(java.util.Date.class, dateConverter);
        generator.registerTypeConversion(java.sql.Date.class, dateConverter);
        generator.registerTypeConversion(java.sql.Timestamp.class, dateConverter);

        final JSONParser parser = new JSONParser();
        parser.setTypeMapper(typeMapper);
        parser.setTypeConverterRepository(typeConverterRepository);
        parser.registerTypeConversion(java.util.Date.class, dateConverter);
        parser.registerTypeConversion(java.sql.Date.class, dateConverter);
        parser.registerTypeConversion(java.sql.Timestamp.class, dateConverter);

        return new JSONConfig(generator, parser);
    }
}

This class creates a generator for converting from a Java class (Event or Place) and its json equivalent, the parser reverses the process. There are a few key things to look at in the typeMapper (used in both generator and parser), specifically the base type and the discriminator field. typeMapper.setEnforcedBaseType(AppDocument.class) will only convert docs that inherit from the AppDocument class. typeMapper.setDiscriminatorField("docType") will use our docType field and value to discriminate between different types of documents. You can feel free to change this field to some other name, but you’ll need to change the method and json mapping in the AppDocument class. To refresh your memory, here is the method we’re referring to:

@JSONProperty(value = "docType", readOnly = true)
public String getDocumentType()
{
    return this.getClass().getSimpleName();
}

The final item to look at is typeMapper.setPathMatcher(new SubtypeMatcher(AppDocument.class)) which will automatically look at sub-types to make sure that we’re converting between objects that inherit from AppDocument. It is possible to supply your own parser for several of the jcouchdb method calls for retrieving or querying the database, but we won’t be investigating those in this tutorial.

Now that we have the classes we need it’s time to configure our spring context. We’ve separated out our CouchDB-specific points to couchdb-config.xml.

couchdb-config.xml

<beans 
    xmlns="http://www.springframework.org/schema/beans" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:tx="http://www.springframework.org/schema/tx" 
    xmlns:util="http://www.springframework.org/schema/util"
    xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
        http://www.springframework.org/schema/tx 
        http://www.springframework.org/schema/tx/spring-tx-3.1.xsd
        http://www.springframework.org/schema/util 
        http://www.springframework.org/schema/util/spring-util-3.1.xsd
        http://www.springframework.org/schema/context 
        http://www.springframework.org/schema/context/spring-context-3.1.xsd
        http://www.springframework.org/schema/lang
        http://www.springframework.org/schema/lang/spring-lang-3.1.xsd">


    <context:annotation-config />

    <bean id="properties" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer" />

    <bean id="jsonConfigFactory" class="com.clearboxmedia.couchspring.json.JsonConfigFactory"/>

    <bean id="jsonConfig" factory-bean="jsonConfigFactory" factory-method="createJsonConfig"/>

    <!-- If my db requires username/password, I will need to set up a Principal -->
    <bean id="couchPrincipal" class="org.apache.http.auth.UsernamePasswordCredentials">
      <constructor-arg value="${couchdb.username}" />
      <constructor-arg value="${couchdb.password}" />
    </bean>

    <bean id="serverFactory" class="com.clearboxmedia.couchspring.couch.CouchDbServerFactory" />

    <bean id="couchDbServer" factory-bean="serverFactory" factory-method="createCouchDbServerInstance">
      <constructor-arg value="${couchdb.url}"/>
      <constructor-arg name="credentials" ref="couchPrincipal" />
    </bean>

    <bean id="systemDatabase" class="org.jcouchdb.db.Database">
        <constructor-arg ref="couchDbServer"/>
        <constructor-arg value="couchspring-dev"/>
        <property name="jsonConfig" ref="jsonConfig"/>
    </bean>
</beans>

The first thing we need to do is setup our annotations with <context:annotation-config />, which sets up the spring context’s annotations. The next two sections setup thejsonConfigFactory and gets it ready to use in our server instance. Finally, we create ourserverFactory that we use to create an instance of our couchDbServer, which is then fed into the jcouchd database instance along with our jsonConfig and the database name we want to connect with. All of our properties - username, password and url are currently passed in through the command-line but you could just as easily provide a specific property file.

Now that we’ve got everything configured it’s time to write some tests.

Create, Save, Retrieve, Update, and Delete

Before we dive into creating views, let’s start with some basics like creating, updating, retrieving and deleting. For all of our tests we want to do a few things to them. Here’s the class definition for CouchSaveTest, but it is the same for the other tests as well.

CouchSaveTest.java (header)

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration("/root-context.xml")
public class CouchSaveTest {

    @Autowired
    protected Database database;
    ...
}

The first annotation @RunWith tells Maven to use the SpringJUnit4ClassRunner to run this test (as opposed to a standard JUnit class runner). This allows our next annotation to start up a Spring context for this test @ContextConfiguration("/root-context.xml"). This context loads all of our CouchDB beans, our POJOs with their JSON annotations, and ourCouchDBUpdater that automatically updates our views for us to the CouchDB server. We will cover this last one in the Views section below.

Finally, we tell Spring to autowire in our database into the test class so that we can use it.

Document creation

One of the first steps in any kind of DB storage system is the ability to create a new record (or in our case, a document). How do we do this using jcouchdb’s API?

CouchSaveTest.java (testEventSave())

@Test
public void testEventSave() {
    Event document = new Event();
    document.setTitle("Test");
    assertTrue(document.getId() == null);

    database.createDocument(document);
    assertTrue(document.getId() != null);
}

Here we create a new Event object and then call the database.createDocument() method and pass in the new event. Our JsonConfigFactory will then map our fields into a CouchDB document. [insert screenshot]

Document retrieval and update

CouchSaveTest.java (testEventSave_Update())

@Test
public void testEventSave_Update() {
    Event document = database.getDocument(Event.class, "2875977125");
    assertTrue(document != null);

    document.setDescription("Testing out save");

    database.createOrUpdateDocument(document);

    Event newdocument = database.getDocument(Event.class, "2875977125");
    assertTrue(document != null);
    assertTrue(document.getDescription().equals("Testing out save"));
}

This method actually tests two things for us, first retrieving a document by calling Event document = database.getDocument(Event.class, "2875977125"); and passing in its document id - “2875977125” in this case. We’re also testing the update methoddatabase.createOrUpdateDocument(document); which will, as its name suggests, either create a new document or update an existing one (meaning if it already has an id that matches a document in the database, it will update).

CouchSaveTest.java (testEventSave_Exists2())

@Test(expected = IllegalStateException.class)
public void testEventSave_Exists2() {
    Event document = database.getDocument(Event.class, "2875977125");
    assertTrue(document != null);

    database.createDocument(document);
    assertFalse(document.getId().equals("2875977125"));
}

This final test throws an exception if we attempt to create a document that already exists (note that we aren’t using the createOrUpdateDocument()) method.

Document deletion

Deleting a document is just as easy as creating and updating.

CouchDeleteTest.java (testEventDelete())

@Test
public void testEventDelete() {
    Event document = database.getDocument(Event.class, "3083848875");
    assertTrue(document != null);
    database.delete(document);

    try {
        document = database.getDocument(Event.class, "3083848875");
    }
    catch (Exception e) {
        assertTrue(e instanceof org.jcouchdb.exception.NotFoundException);
    }
}

@Test(expected = org.jcouchdb.exception.NotFoundException.class)
public void testEventDelete_NotExists() {
    Event document = database.getDocument(Event.class, "-2");
    assertTrue(document != null);
    database.delete(document);
    document = database.getDocument(Event.class, "-2");
    assertTrue(document == null);
}

These two methods test calling the delete() method first on a document that does exist and second on one that doesn’t (which will throw a NotFoundException).

Querying for documents

Now that we have the basic CRUD operations complete, we need to get down into doing something a bit more complex. Querying our database by more than just the id of the document we’re looking for. For this article we’re just going to delve into views a little bit, as they can be very complex. More on views can be found on the CouchDB wiki as well as the online version ofCouchDB: The Definitive Guide.

With that being said, let’s get started writing some views!

Introduction to Views

First, what exactly are CouchDB views and how do they work?

Views are a way to filter or query the data in your database. Views are typically written using JavaScript, it is possible to write views using other languages, but that is a different topic we won't be covering here. Each view maps keys to values inside of a document. Views indexes are not updated until a document is accessed, but you can changes this behavior with an external script if you wish. All views in a single design document get updated when one of the views in that design document gets queried.

Design documents

Before we look at creating views we should discuss how our application automatically uploads (and keeps the views up to date). All views are tied to a design document. We will have two design documents in this instance:

event
place

These two design documents will be created automatically by theorg.jcouchdb.util.CouchDBUpdater class. This class is configured in the couchdb-services.xml file.

couchdb-services.xml

    <beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:util="http://www.springframework.org/schema/util"
        xmlns:context="http://www.springframework.org/schema/context"
        xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
                            http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-2.5.xsd
                            http://www.springframework.org/schema/util
http://www.springframework.org/schema/util/spring-util-2.5.xsd

http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-2.5.xsd">

        <context:annotation-config />

        <bean id="systemUpdater"
class="org.jcouchdb.util.CouchDBUpdater"
init-method="updateDesignDocuments">
            <property name="createDatabase" value="true"/>
            <property name="database" ref="systemDatabase"/>
            <property name="designDocumentDir" value="designdocs/"/>
        </bean>
    </beans>

The CouchDBUpdater listens for changes in our designdocs directory and automatically pushes those changes up to the configured CouchDB database. What does the designdocsdirectory actually contain then?

    -designdocs
        -event
            -allByDate.map.js
            -allByParentId.map.js
            -allByVenueId.map.js
            -list.map.js
        -place
            -list.map.js

Each of the directories actually map to a design document in CouchDB.

(Click on the image to enlarge it)

Fig. 7 - DesignDocs Events

Great, now, let's write those views.

Our first view

Here, then, is a simple view that looks for all documents that are “event” documents:

function(doc) {
    if (doc.docType == 'Event') {
        emit(doc.id, doc);
    }
}

This view simply returns the id of all documents that have a field docType that matches the value Event. Let’s examine this a bit to see what it is doing. The first line is a JavaScript function definition which accepts a doc as its sole parameter. We can then examine values stored inside the documents themselves (doc.docType in our case). Then finally, we have the emit function which takes two arguments key and value, where value can be null. Our key in this case is the doc.id field and our value is the full document.

The emit function is what we will actually be using to query our database in the next few view examples. The other key thing to understand about emit is that it will order the returned documents by their key value.

Here's our test case for calling the list view.

CouchViewTest.java (testListingQuery())

    @Test
    public void testListingQuery() {
        String viewName = "event/list";
        ViewResult results = database.queryView(viewName, Event.class, null, null);         assertNotNull(results);         assertEquals(27, results.getRows().size());     }

Retrieving events through venue id

One of the first views that will be handy to use will be to retrieve a given set of events by their associated venueId. To do this we will need to write a view that emits the venueId as its key and the document as its value (although not strictly needed with jcouchdb’s functions). So, what does the view look like, then?

function(doc) {
    if (doc.type == 'Event') {
        emit(doc.venueId, {doc});
    }
}

It looks very similar to the simple view we wrote earlier, except that this time when we call it from our application, we will be passing a venue id to query against.

CouchViewTest.java (testQueryByVenueId())

    @Test
    public void testQueryByVenueId() {
        String viewName = "event/allByVenueId";


        ViewAndDocumentsResult

One of the key differences here in how we are calling the view is that we are using thequeryViewAndDocumentsByKeys() method to pass in the viewName, the mapping class Eventand the keys we are querying on (in this case just one key is queried that of the venueId).

Retrieving events by date

Both of those views were relatively simple. How do we do something a bit more complex like querying by date? First, we need to define our view.

function(doc) {
    if (doc.docType == 'Event') {
        var startDate = new Date(doc.startTime);
        var startYear = startDate.getFullYear();
        var startMonth = startDate.getMonth();
        var startDay = startDate.getDate();
        emit([
            startYear,
            startMonth,
            startDay
        ]);
    }
}

Now, how do we call this function?

[todo: code example for calling view from java]

Dynamic query for retrieving venue from an event

CouchViewTest.java (testQueryByDate())

    @Test
    public void testQueryByDate() {
        String viewName = "event/allByDate";
        Options opts = new Options();
        opts.startKey(Arrays.asList(2013, 7, 11));
        opts.endKey(Arrays.asList(2013, 7, 11));

        ViewAndDocumentsResult

We have a new object here called Options which allows us to specify which query options we wish to pass in to our view. In this instance we are providing a startKey and an endKey to retrieve a set of objects. One thing to be aware of is that what you emit/match against must be the type of data you are passing in. In our case we are dealing with ints so we must pass inint fields to our keys. Order (of course) is also key, we are passing in year, day, month to match against the year, day and month in the view.

Now, what is this endKey? So, the endKey parameter allows us to specify a range for our query. In this instance we've chosen the same date, but we could easily have chosen different values to get more or fewer documents back. CouchDB will simply compare each of the keys in turn until it no longer matches and will return that set of documents back to us.

Dynamic query for retrieving venue from an event

What we're doing here is simply applying the same logic that we did for queryByVenueId, except for places by event id.

    @JSONProperty(ignore = true)
    public Place getVenue() {
        String viewName = "place/allByEventId";

        ViewAndDocumentsResult

You just need to write another view similar to the allByVenueId for the place document and that's it.

Where can we go from here?

The view (or map) is just the first part of the map/reduce functionality that CouchDB provides. So, what is the reduce (and re-reduce) functionality and what can we do with it?

Reduce allows us to take a set of results from a previous map and perform additional operations on it to reduce the results into a more compact form.

We will leave reduce and re-reduce for you to explore on your own, but you can do some veryinteresting things with them. Explore, and have fun with CouchDB!

About the Authors

Leo Pryzbylski is a pillar of technical innovation at Clearbox media. He wields a giant mallet of creative problem solving in one hand and a enchanted claymore of software architecture experience in melee combat against the horde of software irrelevence. Leo has a broad skillset attributed to his experience as a game developer, qa engineer, release manager, configuration managerdevelopment manager, computer scientist, network intrusion specialist, embedded sofware engineer, software architect, scientific programmer, and system administrator.

Warner Onstine started his career in the tech industry doing technical support for Intuit in the early 90′s. While there, he learned how to develop web applications and left to pursue a career as a software engineer. Since then, he’s worked at a variety of places including Intalio, the University of Arizona, and now works as a lead developer at rSmart.The seed for ClearBox Media started when Warner learned about ARGs and started playing them for fun a few years ago. They are appealing as they are a good blend of playing in the real world, but within a fictional environment. Warner, and others, see the potential for ARGs to change our society for the better and is one of the guiding principles of “The Human Mosaic Project” – Have Fun. Do Good.

Pages

Friday, 15 August 2014

Transitioning from RDBMS to NoSQL. Interview with Couchbase’s Dipti Borkar

What is CouchDB and Why Should I Care?

How are we going to use CouchDB?

Jcouchdb

What alternatives to jcouchdb are there?

Getting Started

Getting CouchDB setup

Configuring jcouchdb, Spring and our POJOs

POJOs with some custom annotations

Configuring jcouchdb and the JsonConfigFactory

Create, Save, Retrieve, Update, and Delete

Document creation

Document retrieval and update

Document deletion

Querying for documents

Introduction to Views

Design documents

Our first view

Retrieving events through venue id

Retrieving events by date

Dynamic query for retrieving venue from an event

Dynamic query for retrieving venue from an event

Where can we go from here?

About the Authors