Sunday 11 October 2015

Setting up Apache ZooKeeper Cluster

This tutorial provides step by step instructions to configure and start up Apache ZooKeeper 3.4.6 Multi-node cluster.

Pre-requisites for zookeeper

First thing that we would need in order to install Apache ZooKeeper are multiple machines. 

In this tutorial, We will be utilizing following virtual machines to install Apache ZooKeeper -

Parameter NameServer 1Server 2Server 3
Host Namenode1node2node3
IP Address192.168.0.1192.168.0.2192.168.0.3
Operating SystemUbuntuUbuntuUbuntu
No of CPU Cores444
RAM6 GB6 GB6 GB

Apart from above machines, please ensure that the following pre-requisites have been fulfilled to ensure that you are able to follow this article without any issues-
  1. JDK 6 or higher installed on all the virtual machines
  2. JAVA_HOME variable set to the path where JDK is installed
  3. Root access on all the virtual machines as all the steps should ideally be performed by root user
  4. Updated /etc/hosts file on both the Servers with below details
                   192.168.0.1   node1
                   192.168.0.2   node2
                   192.168.0.3   node3

Installing Apache Zookeeper

First step to install Apache ZooKeeper is to download its binaries on both the Servers. In this article, we will be installing Apache ZooKeeper 3.4.6 to set up cluster which can be downloaded from here.

Once the libraries have been downloaded on the Servers, you can extract it to a directory where you would like ZooKeeper to be installed. We will refer this directory as $ZOOKEEPER_HOME throughout this tutorial.


Configuring Multi-node Cluster
Once Apache ZooKeeper has been extracted on all the Servers, next step is to configure these. 
We don't need to mark any node as Leader node during configuration as the leader is automatically chosen by ZooKeeper service. So, configuration for all the nodes will be same. 

First part of configuration involves creating/updating a configuration file called zoo.cfg in $ZOOKEEPER_HOME/conf directory with following contents:


ZooKeeper Configuration - $ZOOKEEPER_HOME/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
# Where you would like ZooKeeper to save its data
dataDir=$ZOOKEEPER_HOME/data
# Where you would like ZooKeeper to log
dataLogDir=$ZOOKEEPER_HOME/logs
clientPort=2181
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888

First thing that you would need to do in above zoo.cfg file is to replace the value of dataDir and dataLogDir with the directory where you would like ZooKeeper to save its data and log respectively. Now, let's talk about some of the important parts of above configuration.

clientPort property, as the name suggests, is for the clients to connect to ZooKeeper Service.

Next let's talk about the last two entries in server.x=hostname:port1:port2 format. 

Firstly, there are two port numbers port1(2888) and port2(3888). The first followers use to connect to the leader, and the second is for leader election. 
Secondly, x in server.x denotes the id of node. 

Each server.x row must have unique id. Each server is assigned an id by creating a file named myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameter dataDir.

The myid file consists of a single line containing only the text of that machine's id. So myid of server 1 would contain the text 1 and nothing else. 

The id must be unique within the ensemble and should have a value between 1 and 255.


Starting Up Multi-node Cluster

Once you are all set up, next step is to start the cluster. 

On all the Servers, go to bin directory of Apache ZooKeeper and execute the following commands -

ZOOKEEPER_HOME/bin on all machines
./zkServer.sh start

You can execute the follow command to check the status of Apache ZooKeeper -

ZOOKEEPER_HOME/bin on all machines
./zkServer.sh status


Stopping Multi-node Cluster

In order to stop Apache ZooKeeper, execute the following command on all the Servers -

$ZOOKEEPER_HOME/bin on all machines
./zkServer.sh stop

Saturday 10 October 2015

How ZooKeeper Works

How ZooKeeper Works

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers, known as znodes. Every znode is identified by a path, with path elements separated by a slash (“/”). Aside from the root, every znode has a parent, and a znode cannot be deleted if it has children.

This is much like a normal file system, but ZooKeeper provides superior reliability through redundant services. A service is replicated over a set of machines and each maintains an in-memory image of the the data tree and transaction logs. Clients connect to a single ZooKeeper server and maintains a TCP connection through which they send requests and receive responses.

This architecture allows ZooKeeper to provide high throughput and availability with low latency, but the size of the database that ZooKeeper can manage is limited by memory
Related Posts Plugin for WordPress, Blogger...