Saturday, 10 October 2015

Introduction to Apache ZooKeeper


This article introduces you to Apache ZooKeeper by providing details of its technical architecture. It also talks about its benefits along with the use cases it could be utilized in.

Abstract
ZooKeeper, at its core, provides an API to let you manage your application state in a highly read-dominant concurrent and distributed environment. It is optimized for and performs well in the scenario where read operations greatly outnumber write operations.

As Apache defines it, ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming.

It is implemented in Java and has bindings for both Java and C. It uses tree structure of file systems to manage its data among its nodes.
Technical Architecture
It is time to discuss Technical Architecture of ZooKeeper. Following diagram depicts the architecture of ZooKeeper






There are following two types of nodes shown in above diagram -
  1. Leader Node - Leader Node is the only node responsible for processing the write requests. All other nodes called followers simply delegate the client write calls to Leader node.We don't mark any node as leader while setting up Apache ZooKeeper cluster. It instead is elected internally among all the nodes of cluster. Apache ZooKeeper uses the concept of majority for same i.e. Node that gets highest number of votes is elected as Leader.
    This serves as the basis of recommendation that suggests to have odd number of nodes in a cluster for best failover and availability. E.g. if we create the cluster of four nodes and two nodes go offline for some reason. Apache ZooKeeper will be down as half of the nodes have gone offline as it is not possible to gain majority for Leader node election. However if we create the cluster of five nodes, even if two nodes go offline, Apache ZooKeeper will still be functional as we still have majority of nodes in service.
  2. Follower Nodes - All nodes other than Leader are called Follower Nodes. A follower node is capable of servicing read requests on its own. For write requests, it gets it done through Leader Node. Followers also play an important role of electing a new leader if existing leader node goes down.

And here is the brief description of Node components as shown in architecture diagram. Please note that these are not the only components in Nodes.
  1. Request Processor - This component is only active in Leader Node and is responsible for processing write requests originating from client or follower nodes. Once request processor proesses the write request, it broadcasts the changes to follower nodes so that they can update their state accordingly.
  2. Atomic Broadcast -This component is present in both Leader Node and Follower Nodes. This component is responsible for broadcasting the changes to other nodes (in Leader Node) as well as receiving the change notification (in Follower Nodes).
  3. In-memory Database (Replicated Database) - This in-memory and replicated database is responsible for storing the data in ZooKeeper. Every node contains its own database that enables them to server read requests. In addition to this, data is also written to file system providing recoverability in case of any problems with cluster. In case of write requests, in-memory database is updated only after it has successfully been written to file system.
Benefits
Apache ZooKeeper can help you reap following benefits if the applications utilize it for the right cases (please see next section on this) -
  1. Simple Design
  2. Fast Processing
  3. Data Replication
  4. Atomic and Ordered Updates
  5. Reliability
Possible Use Cases
Apache ZooKeeper, being a coordination service, is suitable for but not limited to following scenarios -
  • Synchronizations primitives such as Barriers, Queues for the distributed environment
  • Multi-machines cluster management
  • Coordination and failure recovery service
  • Automatic leader selection
Below are some of instances where Apache ZooKeeper is being utilized -
  • Apache Storm, being a real time stateless processing/computing framework, manages its state in ZooKeeper Service
  • Apache Kafka uses it for choosing leader node for the topic partitions
  • Apache YARN relies on it for the automatic failover of resource manager (master node)
  • Yahoo! utilties it as the coordination and failure recovery service for Yahoo! Message Broker, which is a highly scalable publish-subscribe system managing thousands of topics for replication and data delivery. It is used by the Fetching Service for Yahoo! crawler, where it also manages failure recovery.

References





Thank you for reading through the tutorial. In case of any questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.














Apache Hadoop NextGen MapReduce (YARN)


MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

MapReduce NextGen Architecture

The ResourceManager has two main components: 
  1. Scheduler 
  2. ApplicationsManager


The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. 

The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc. In the first version, only memory is supported.

The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in.

The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources

The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.

The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.

MRV2 maintains API compatibility with previous stable release (hadoop-1.x). This means that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.



Related Posts Plugin for WordPress, Blogger...