Monday 12 October 2015

What is Hadoop Rack Awareness and How to configure it in a cluster

Details about Hadoop Rack Awareness

The Hadoop HDFS and the Map/Reduce components are rack-aware.
The NameNode and the JobTracker obtains the rack id of the slaves in the cluster by invoking an API resolve in an administrator configured module. The API resolves the slave’s DNS name (also IP address) to a rack id. What module to use can be configured using the configuration item topology.node.switch.mapping.impl. The default implementation of the same runs a script/command configured using topology.script.file.name
If topology.script.file.name is not set, the rack id /default-rack is returned for any passed IP address. The additional configuration in the Map/Reduce part is mapred.cache.task.levels which determines the number of levels (in the network topology) of caches.
So, for example, if it is the default value of 2, two levels of caches will be constructed – one for hosts (host -> task mapping) and another for racks (rack -> task mapping).

What is Rack Awareness in Hadoop

For small clusters in which all servers are connected by a single switch, there are only two levels of locality: “on-machine” and “off-machine.” When loading data from a DataNode’s local drive into HDFS, the NameNode will schedule one copy to go into the local DataNode, and will pick two other machines at random from the cluster.
For larger Hadoop installations which span multiple racks, it is important to ensure that replicas of data exist on multiple racks. This way, the loss of a switch does not render portions of the data unavailable due to all replicas being underneath it.
HDFS can be made rack-aware by the use of a script which allows the master node to map the network topology of the cluster. While alternate configuration strategies can be used, the default implementation allows you to provide an executable script which returns the “rack address” of each of a list of IP addresses.
The network topology script receives as arguments one or more IP addresses of nodes in the cluster. It returns on stdout a list of rack names, one for each input. The input and output order must be consistent.
To set the rack mapping script, specify the key topology.script.file.name in conf/hadoop-site.xml. This provides a command to run to return a rack id; it must be an executable script or program. By default, Hadoop will attempt to send a set of IP addresses to the file as several separate command line arguments. You can control the maximum acceptable number of arguments with the topology.script.number.args key.
Rack ids in Hadoop are hierarchical and look like path names. By default, every node has a rack id of /default-rack. You can set rack ids for nodes to any arbitrary path, e.g., /foo/bar-rack. Path elements further to the left are higher up the tree. Thus a reasonable structure for a large installation may be /top-switch-name/rack-name.
Hadoop rack ids are not currently expressive enough to handle an unusual routing topology such as a 3-d torus; they assume that each node is connected to a single switch which in turn has a single upstream switch. This is not usually a problem, however. Actual packet routing will be directed using the topology discovered by or set in switches and routers. The Hadoop rack ids will be used to find “near” and “far” nodes for replica placement (and in 0.17, MapReduce task placement).
The following example script performs rack identification based on IP addresses given a hierarchical IP addressing scheme enforced by the network administrator. This may work directly for simple installations; more complex network configurations may require a file- or table-based lookup process. Care should be taken in that case to keep the table up-to-date as nodes are physically relocated, etc. This script requires that the maximum number of arguments be set to 1.
#!/bin/bash # Set rack id based on IP address. # Assumes network administrator has complete control # over IP addresses assigned to nodes and they are # in the 10.x.y.z address space. Assumes that # IP addresses are distributed hierarchically. e.g., # 10.1.y.z is one data center segment and 10.2.y.z is another; # 10.1.1.z is one rack, 10.1.2.z is another rack in # the same segment, etc.) # # This is invoked with an IP address as its only argument
# get IP address from the input ipaddr=$0
# select “x.y” and convert it to “x/y” segments=`echo $ipaddr | cut –delimiter=. –fields=2-3 –output-delimiter=/` echo /${segments}



Configuring Rack Awareness in Hadoop

We are aware of the fact that hadoop divides the data into multiple file blocks and stores them on different machines. If Rack Awareness is not configured, there may be a possibility that hadoop will place all the copies of the block in same rack which results in loss of data when that rack fails.
Although rare, as rack failure is not as frequent as node failure, this can be avoided by explicitly configuring the Rack Awareness in conf-site.xml.
Rack awareness is configured using the property “topology.script.file.name” in the core-site.xml.
If topology.script.file.name is not configured, /default-rack is passed for any ip address i.e., all nodes are placed on same rack.
Configuring Rack awareness in hadoop involves two steps:
configure the “topology.script.file.name” in core-site.xml ,
<property> <name>topology.node.switch.mapping.impl</name> <value>org.apache.hadoop.net.ScriptBasedMapping</value> <description> The default implementation of the DNSToSwitchMapping. It invokes a script specified in topology.script.file.name to resolve node names. If the value for topology.script.file.name is not set, the default value of DEFAULT_RACK is returned for all node names. </description> </property>
<property> <name>topology.script.file.name</name> <value>core/rack-awareness.sh</value> </property>

Implement the rack-awareness.sh scripts as desired, Sample rack-awareness scripts can be found here,
1. Topology Script file named as : rack-awareness.sh

A sample Bash shell script:

HADOOP_CONF=$HADOOP_HOME/conf 

while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< ${HADOOP_CONF}/topology.data 
  result="" 
  while read line ; do
    ar=( $line ) 
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done 
  shift 
  if [ -z "$result" ] ; then
    echo -n "/default/rack "
  else
    echo -n "$result "
  fi
done 

2. Topology data file named as : topology.data

orienit.node1.com                     /dc1/rack1
orienit.node2.com                     /dc1/rack1
orienit.node3.com                     /dc1/rack1
orienit.node11                           /dc1/rack2
orienit.node12                           /dc1/rack2
orienit.node13                           /dc1/rack2
10.1.1.1                                     /dc1/rack3
10.1.1.2                                     /dc1/rack3
10.1.1.3                                     /dc1/rack3

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...