Monday, 4 January 2016

Running a sample Pi example

To run any application on top of YARN, you need to follow this Java command syntax:
$ yarn jar <application_jar.jar> <arg0> <arg1>

To run a sample example to calculate the value of PI with 16 maps and 10,000 samples, use the following command:
$ yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar PI 16 10000

Note that we are using  hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar  here.

The JAR version may change depending on your installed Hadoop distribution.

Once you hit the preceding command on the console, you will see the logs generated by the application on the console, as shown in the following command. The default logger configuration is displayed on the console. 

The default mode is INFO, and you may change it by overwriting the default logger settings by updating hadoop.root.logger=WARN,console in conf/log4j.properties:

Number of Maps  = 16
Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15

Starting Job
11/09/14 21:12:02 INFO mapreduce.Job: map 0% reduce 0% 
11/09/14 21:12:09 INFO mapreduce.Job: map 25% reduce 0% 
11/09/14 21:12:11 INFO mapreduce.Job: map 56% reduce 0% 
11/09/14 21:12:12 INFO mapreduce.Job: map 100% reduce 0% 
11/09/14 21:12:12 INFO mapreduce.Job: map 100% reduce 100% 
11/09/14 21:12:12 INFO mapreduce.Job: Job job_1381790835497_0003 completed successfully 
11/09/14 21:12:19 INFO mapreduce.Job: Counters: 44        

File System Counters                
    FILE: Number of bytes read=358                
    FILE: Number of bytes written=1365080
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=4214                
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=67
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters
                Launched map tasks=16
                Launched reduce tasks=1
                Data-local map tasks=14
                Rack-local map tasks=2
                Total time spent by all maps in occupied slots  (ms)=184421
                Total time spent by all reduces in occupied slots (ms)=8542
        Map-Reduce Framework
                Map input records=16
                Map output records=32
                Map output bytes=288
                Map output materialized bytes=448
                Input split bytes=2326
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=448
                Reduce input records=32
                Reduce output records=0
                Spilled Records=64
                Shuffled Maps =16
                Failed Shuffles=0
                Merged Map outputs=16
                GC time elapsed (ms)=195 
                CPU time spent (ms)=7740
                Physical memory (bytes) snapshot=6143396896
                Virtual memory (bytes) snapshot=23142254400
                Total committed heap usage (bytes)=43340769024
  Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0 
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1848
        File Output Format Counters
                Bytes Written=98
Job Finished in 23.144 seconds 

Estimated value of Pi is 3.14127500000000000000

You can compare the example that runs over Hadoop 1.x and the one that runs over YARN. You can hardly differentiate by looking at the logs, but you can clearly identify the difference in performance. YARN has backward-compatibility support with MapReduce 1.x, without any code change.






Running sample examples on YARN

Running the available sample MapReduce programs is a simple task with YARN. The Hadoop version ships with some basic MapReduce examples. 

You can find them inside $HADOOP_HOME/share/Hadoop/mapreduce/Hadoop-mapreduce-examples-<HADOOP_VERSION>.jar . 

The location of the file may differ depending on your Hadoop
installation folder structure.

Let’s include this in the  YARN_EXAMPLES  path:
$export YARN_EXAMPLES=$HADOOP_HOME/share/Hadoop/mapreduce

Now, we have all the sample examples in the YARN_EXAMPLES environmental variable. You can access all the examples using this variable; to list all the available examples, try typing the following command on the console:

$ yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar

An example program must be given as the first argument.

The valid program names are as follows:

  • aggregatewordcount : This is an aggregate-based map/reduce program that counts the words in the input files
  • aggregatewordhist : This is an aggregate-based map/reduce program that computes the histogram of the words in the input files
  • bbp : This is a map/reduce program that uses Bailey-Borwein-Plouffe to compute the exact digits of Pi
  • dbcount : This is an example job that counts the page view counts from a database
  • distbbp : This is a map/reduce program that uses a BBP-type formula to compute the exact bits of Pi
  • grep : This is a map/reduce program that counts the matches of a regex in the input
  • join : This is a job that affects a join over sorted, equally-partitioned datasets
  • multifilewc : This is a job that counts words from several files
  • pentomino : This is a map/reduce tile that lays a program to find solutions to pentomino problems
  • pi : This is a map/reduce program that estimates Pi using a quasi-Monte Carlo method
  • randomtextwriter : This is a map/reduce program that writes 10 GB of random textual data per node
  • randomwriter : This is a map/reduce program that writes 10 GB of random data per node
  • secondarysort : This is an example that defines a secondary sort to the reduce
  • sort : This is a map/reduce program that sorts the data written by the random writer
  • sudoku : This is a sudoku solver
  • teragen : This generates data for the terasort
  • terasort : This runs the terasort
  • teravalidate : This checks the results of terasort
  • wordcount : This is a map/reduce program that counts the words in the input files
  • wordmean : This is a map/reduce program that counts the average length of the words in the input files
  • wordmedian : This is a map/reduce program that counts the median length of the words in the input files
  • wordstandarddeviation : This is a map/reduce program that counts the standard deviation of the length of the words in the input files


These were the sample examples that come as part of the YARN distribution by default. 


Related Posts Plugin for WordPress, Blogger...