Wednesday 4 February 2015

Hadoop Monitoring Using Ganglia

This post is about monitoring the Hadoop metrics such as HDFS, MAPREDUCE, JVM, RPC and UGI using the Ganglia Monitoring Tool.

I assume that the readers of blog have prior knowledge of Ganglia and Hadoop technology.

To integrate the Ganglia with Hadoop you need to configure hadoop-metrics.properties file of hadoop located inside the hadoop conf folder. In this configuration file you need to configure the server address of ganglia gmetad, period for sending metrics data and ganglia context class name.

The format and name of hadoop metrics properties file is different for various hadoop versions.
For Hadoop 0.20.x, 0.21.0 and 0.22.0 versions, the file name is hadoop-metrics.properties.
For Hadoop 1.x.x and 2.x.x versions, the file name is hadoop-metrics2.properties.
The ganglia context class name also differs with version change of ganglia, for detailed information about Ganglia Context class you can read from GangliaContext.

Procedure of configuring the hadoop metrics properties file: ---------------------------------------------------------------------------------------------
1. Configuration for 2.x.x versions: In such hadoop versions the metrics properties file is located inside the $HADOOP_HOME/etc/hadoop/ folder. Configure thehadoop-metrics2.properties file using the code as shown below:

namenode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
namenode.sink.ganglia.period=10
namenode.sink.ganglia.servers=gmetad_server_ip:8649

datanode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
datanode.sink.ganglia.period=10
datanode.sink.ganglia.servers=gmetad_server_ip:8649

resourcemanager.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
resourcemanager.sink.ganglia.period=10
resourcemanager.sink.ganglia.servers=gmetad_server_ip:8649

nodemanager.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
nodemanager.sink.ganglia.period=10
nodemanager.sink.ganglia.servers=gmetad_server_ip:8649



2. Configuration for 1.x.x versions: In such hadoop versions the metrics properties file is located inside the $HADOOP_HOME/conf/ folder. Configure the hadoop-metrics2.properties file using the code as shown below:

namenode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
namenode.sink.ganglia.period=10
namenode.sink.ganglia.servers=gmetad_server_ip:8649

datanode.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
datanode.sink.ganglia.period=10
datanode.sink.ganglia.servers=gmetad_server_ip:8649

jobtracker.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
jobtracker.sink.ganglia.period=10
jobtracker.sink.ganglia.servers=gmetad_server_ip:8649

tasktracker.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
tasktracker.sink.ganglia.period=10
tasktracker.sink.ganglia.servers=gmetad_server_ip:8649

maptask.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
maptask.sink.ganglia.period=10
maptask.sink.ganglia.servers=gmetad_server_ip:8649

reducetask.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
reducetask.sink.ganglia.period=10
reducetask.sink.ganglia.servers=gmetad_server_ip:8649


3. Configuration for 0.20.x, 0.21.0 and 0.22.0 versions: In such hadoop versions the metrics properties file is located inside the $HADOOP_HOME/conf/ folder. Configure the hadoop-metrics.properties file using the code as shown below:

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=10
dfs.servers=gmetad_server_ip:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=10
mapred.servers=gmetad_server_ip:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=10
jvm.servers=gmetad_server_ip:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
rpc.period=10
rpc.servers=gmetad_server_ip:8649

ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
ugi.period=10
ugi.servers=gmetad_server_ip:8649



The above configuration is for the unicast mode of Ganglia. However, if you are running Ganglia in multicast mode then you need to use the multicast address in place of gmetad_server_ip in the configuration file. Once you have applied the above changes, then you need to restart the gmetad and gmond services of Ganglia on the nodes. You also need to restart Hadoop services if they are running. Once you are done with restarting the services, the Ganglia UI displays the Hadoop graphs. InitiallyGanglia UI does not show graphs for the jobs, they will appear on UI only after submitting a job in Hadoop.

Introduction to Ganglia on Ubuntu 14.04

Introduction

Ganglia is a scalable distributed monitoring system. It scales well with very large numbers of servers and is useful for viewing performance metrics in near real-time.
On the back end, Ganglia is made up of the following components:
  • Gmond (Ganglia monitoring daemon): a small service that collects information about a node. This is installed on every server you want monitored.
  • Gmetad (Ganglia meta daemon): a daemon on the master node that collects data from all the Gmond daemons (and other Gmetad daemons, if applicable).
  • RRD (Round Robin Database) tool: a tool on the master node used to store data and visualizations for Ganglia in time series.
  • PHP web front-end: a web interface on the master node that displays graphs and metrics from data in the RRD tool.
Basically, every node (server) that you want monitored has Gmond installed. Every node uses Gmond to send data to the single master node running Gmetad, which collects all the node data and sends it to the RRD tool to be stored. You can then view the data in your web browser with the help of the PHP scripts and Apache.
Here's a diagram of a functioning Ganglia grid, with the master node shown as the Ganglia Server running the Gmetad daemon, and the other nodes shown as connecting servers running the Gmond daemon:
Ganglia Architecture
When you use the web interface to view the monitored data, the data is organized on several levels. Ganglia organizes nodes, which are individual monitored machines, into clusters, which are groups of similar nodes. On a higher level, collections of clusters can also be organized into grids. You'll see this organization when you log into the web interface.
In this article, we will first be setting up a single cluster called my cluster, with two nodes. Later, we will set up a single grid named London with two clusters, Servers and Databases. The examples will show two nodes in each cluster.

Prerequisites

You will need:
  • One master node Droplet running Ubuntu 14.04. This is the node you will use to view all of the monitoring data.
  • At least one additional node that you want to monitor, running Ubuntu 14.04
  • If you want to match the grid examples exactly, you should have two more nodes running Ubuntu 14.04. However, you can easily complete the tutorial with just one node on each cluster.
Create a sudo user on each Droplet. First, create the user with the adduser command, replacing the username with the name you want to use.
adduser username
This will create the user and the appropriate home directory and group. You will be prompted to set a password for the new user and confirm the password. You will also be prompted to enter the user's information. Confirm the user information to create the user.
Next, grant the user sudo privileges with the visudo command.
visudo
This will open the /etc/sudoers file. In the User privilege specification section, add another line for the created user so it looks like this (with your chosen username instead of username):
# User privilege specification
root       ALL=(ALL:ALL) ALL
username   ALL=(ALL:ALL) ALL
Save the file and switch to the new user.
su - username
Update and upgrade the system packages.
sudo apt-get update && sudo apt-get -y upgrade

Installation

On the master node, install Ganglia monitor, RRDtool, Gmetad, and the Ganglia web front end.
sudo apt-get install -y ganglia-monitor rrdtool gmetad ganglia-webfrontend
During installation, you will be asked to restart Apache. Select yes. Depending on your system, you may be asked twice. Select yes again.
Set up the online graphical dashboard by copying the Ganglia web front end configuration file to the Apache sites-enabled folder.
sudo cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled/ganglia.conf
Optional: You may want to password-protect this site for increased security. Otherwise, it will be open to the Internet, and you may not wish to expose your server configurations and IP addresses.
Note: This section and the Client Installation section show a simpler setup involving a single cluster, named my cluster. If you want to set up the grid and both clusters right away, you may want to reference the settings in the Grids section as well.
Edit the Gmetad configuration file to set up your cluster. This file configures where and how the Getad daemon will collect data.
sudo vi /etc/ganglia/gmetad.conf
Find the line that begins with data_source, as shown below:
data_source "my cluster" localhost
Edit the data_source line to list the name of your cluster, the data collection frequency in seconds, and your server's connection information. In the example below, the data source is called my cluster, and it collects metrics once a minute from the localhost (itself). You can add more data_source lines to create as many clusters as you want.
data_source "my cluster" 60 localhost
Save your changes.
Next, edit the Gmond configuration file. Even though this is the master node, we are also setting it up for monitoring as the first node in the "my cluster" cluster. The gmond.conf file configures where the node sends its information.
sudo vi /etc/ganglia/gmond.conf
In the cluster section, make sure you set the name to the same one you set in the gmetad.conf file, which in this example is my cluster. The rest of the fields are optional and can be left asunspecified.
For reference, the owner value specifies the administrator of the cluster, which is useful for contact purposes. The latlong value sets the latitude and longitude coordinates for globally distributed clusters. The url value is for a link to provide more information about the cluster.
[...]
cluster {
  name = "my cluster" ## use the name from gmetad.conf
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}
[...]
In the udp_send_channel section, insert a new host line with the value localhost, which is the server where you're sending the information. Comment out the mcast_join line.
For reference, the mcast_join value provides a multicast address, but we need to send the data to only one host, so this is unnecessary. (If you later decide you want to create a grid for this cluster, you will re-enable it.)
[...]
udp_send_channel   {
  #mcast_join = 239.2.11.71 ## comment out
  host = localhost
  port = 8649
  ttl = 1
}
[...]
In the udp_recv_channel section, comment out the mcast_join and bind lines. (Again, if you want to add this cluster to a grid, you will re-enable these lines.)
The bind value provides a local address to bind to, but since the cluster will only be sending information, this is unncessary.
[...]
udp_recv_channel {
  #mcast_join = 239.2.11.71 ## comment out
  port = 8649
  #bind = 239.2.11.71 ## comment out
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
}
[...]
Restart Ganglia-monitor, Gmetad and Apache.
sudo service ganglia-monitor restart && sudo service gmetad restart && sudo service apache2 restart

Web Interface

Ganglia should now be set up and accessible at http://ip-address/ganglia.
The main page shows the grid view, which is an overview of your monitored nodes. Right now there should be just one: localhost.
Ganglia Web
The main tab allows you to view the data from set and custom time increments. You can also manually refresh the data by clicking the Get Fresh Data button in the top right.
Ganglia Time
Below the time range selection, you can choose a specific node from the dropdown menu labeled --Choose a Node. Right now, localhost should be the only node you see.
Ganglia Node
Select localhost from the list to see information specific to the localhost node. Since localhost is the only node being monitored, the information on the localhost node page and the main tab will be the same.
Ganglia Localhost
From here, you can also click the Node View button in the upper right to view contextual information about the node.
Ganglia Node View
The rest of the main page displays a summary of the node's clusters. Click on any graph to view detailed information by various time increments, from one hour to one year, as well as to export graph data in CSV or JSON formats.
Ganglia Detail
As your nodes grow and viewing them all on the main page becomes difficult, you can use the search tab to find particular hosts or metrics, using regular expressions. You can also compare hosts, create custom aggregate graphs, and more.

Client Installation

On the second node you want to monitor in the my cluster cluster, install the Ganglia monitor.
sudo apt-get install -y ganglia-monitor
Edit the Gmond configuration file for monitoring the node.
sudo vi /etc/ganglia/gmond.conf
Just like we did on the master node, update the cluster name (my cluster in this example) in thecluster section so it matches the name on the master node.
[...]
cluster {
  name = "my cluster"     ## Cluster name
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
[...]
Add a line to the udp_send_channel block for the host, which should be the IP address of your master Ganglia node (e.g. 1.1.1.1). Comment out the mcast_join line.
[...]
udp_send_channel {
  #mcast_join = 239.2.11.71   ## Comment
  host = 1.1.1.1   ## IP address of master node
  port = 8649
  ttl = 1
}
[...]
Comment out the whole udp_recv_channel section with the /* ... */ syntax, as this server won't be receiving anything.
[...]
/* You can specify as many udp_recv_channels as you like as well.
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
}
*/
[...]
Restart the monitoring service.
sudo service ganglia-monitor restart
Wait a few minutes and reload the web interface. The new node should appear in the cluster automatically.
Repeat these steps on any other nodes you want to monitor in this cluster.
You now have a cluster! You can view the overview of your cluster on the web interface, and drill down into specific nodes as well as particular metrics.
Ganglia Cluster

Grids

Grids allow you to organize several clusters together. For instance, if you have several clusters of MySQL databases serving different applications, you can organize all of those clusters in the same grid to view the performance of all your MySQL servers. Or if you have application servers all over the world, you can put them in a grid by location, such as London.
To create a grid, edit the /etc/ganglia/gmetad.conf file on the Ganglia master node.
Please note that you can create only one grid per Gmetad. If you want to create more than one grid you need to install Gmetad on another server. In this example, we will call our grid London.
sudo vi /etc/ganglia/gmetad.conf
Name your grid in the grid section by uncommenting the gridname line and replacing MyGrid with the grid name of your choice. In this example, we will name the grid London.
# The name of this Grid. All the data sources above will be wrapped in a GRID
# tag with this name.
# default: unspecified
# gridname "MyGrid"
For instance, if you are creating your grid for all of your London servers:
gridname "London"
Add or edit a new data_source line for every cluster you want in this grid.
Update the name for the cluster, and then add host and port information for each server you want to add to that cluster. Please note that clusters are identified by the port number, so each new data_sourceline, or cluster, should use a different port number.
For instance, in the example below, we are adding two clusters, called Servers and Databases, to the London grid. All of the nodes in Servers are using port 8556, and all of the nodes in Databases are using port 8857.
data_source "Servers" localhost 1.1.1.2:8556
data_source "Databases" 1.2.1.1:8557 1.2.1.2:8557
On each server (or node) specified in the Gmetad configuration file (in this example, localhost, 1.1.1.2, 1.2.1.1, and 1.2.1.2), edit the Gmond configuration file.
sudo vi /etc/ganglia/gmond.conf
Update the name value in the cluster section to match the cluster name. Here, we'll set up a node to be part of the Databases cluster. (Note that if you set up two nodes using the earlier method, you will have to go back and edit the /etc/ganglia/gmond.conf file on each of them to match the new settings.)
/* If a cluster attribute is specified, then all gmond hosts are wrapped inside
 * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will
 * NOT be wrapped inside of a <CLUSTER> tag. */

cluster {
  name = "Databases"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}
Also, unlike in the previous sections, you should not comment out the mcast_join lines.
Your udp_send_channel block should look like this. Make sure to update the port number! In our example, since this is part of the Databases cluster, the port should be 8557. The other lines can stay the same.
udp_send_channel {
mcast_join = 239.2.11.71
port = 8557
ttl = 1
}
Your udp_recv_channel block should look like this, using the appropriate port number. The other lines can stay the same.
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8557
bind = 239.2.11.71
}
Finally, your tcp_accept_channel block should look like this, using the appropriate port number.
tcp_accept_channel {
port = 8557
}
Restart the monitoring services on each node.
sudo service ganglia-monitor restart
Restart Ganglia-monitor, Gmetad and Apache on the Ganglia host server or master node.
sudo service ganglia-monitor restart && sudo service gmetad restart && sudo service apache2 restart
In the web interface, you should now see the name of your grid, and the option to choose a cluster. From there you can select and drill down into a node.
Ganglia Grid

Conclusion

Ganglia is very easy to set up and scale up from one node to hundreds or thousands. It features a high performance level and can help you monitor as many servers as you need.
Related Posts Plugin for WordPress, Blogger...