To run any application on top of YARN, you need to follow this Java command syntax:
$ yarn jar <application_jar.jar> <arg0> <arg1>
To run a sample example to calculate the value of PI with 16 maps and 10,000 samples, use the following command:
$ yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar PI 16 10000
Note that we are using hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar here.
The JAR version may change depending on your installed Hadoop distribution.
Once you hit the preceding command on the console, you will see the logs generated by the application on the console, as shown in the following command. The default logger configuration is displayed on the console.
The default mode is INFO, and you may change it by overwriting the default logger settings by updating hadoop.root.logger=WARN,console in conf/log4j.properties:
Number of Maps = 16
Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
11/09/14 21:12:02 INFO mapreduce.Job: map 0% reduce 0%
11/09/14 21:12:09 INFO mapreduce.Job: map 25% reduce 0%
11/09/14 21:12:11 INFO mapreduce.Job: map 56% reduce 0%
11/09/14 21:12:12 INFO mapreduce.Job: map 100% reduce 0%
11/09/14 21:12:12 INFO mapreduce.Job: map 100% reduce 100%
11/09/14 21:12:12 INFO mapreduce.Job: Job job_1381790835497_0003 completed successfully
11/09/14 21:12:19 INFO mapreduce.Job: Counters: 44
File System Counters
FILE: Number of bytes read=358
FILE: Number of bytes written=1365080
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4214
HDFS: Number of bytes written=215
HDFS: Number of read operations=67
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=16
Launched reduce tasks=1
Data-local map tasks=14
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=184421
Total time spent by all reduces in occupied slots (ms)=8542
Map-Reduce Framework
Map input records=16
Map output records=32
Map output bytes=288
Map output materialized bytes=448
Input split bytes=2326
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=448
Reduce input records=32
Reduce output records=0
Spilled Records=64
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=195
CPU time spent (ms)=7740
Physical memory (bytes) snapshot=6143396896
Virtual memory (bytes) snapshot=23142254400
Total committed heap usage (bytes)=43340769024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1848
File Output Format Counters
Bytes Written=98
Job Finished in 23.144 seconds
Estimated value of Pi is 3.14127500000000000000
You can compare the example that runs over Hadoop 1.x and the one that runs over YARN. You can hardly differentiate by looking at the logs, but you can clearly identify the difference in performance. YARN has backward-compatibility support with MapReduce 1.x, without any code change.
$ yarn jar <application_jar.jar> <arg0> <arg1>
To run a sample example to calculate the value of PI with 16 maps and 10,000 samples, use the following command:
$ yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar PI 16 10000
Note that we are using hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar here.
The JAR version may change depending on your installed Hadoop distribution.
Once you hit the preceding command on the console, you will see the logs generated by the application on the console, as shown in the following command. The default logger configuration is displayed on the console.
The default mode is INFO, and you may change it by overwriting the default logger settings by updating hadoop.root.logger=WARN,console in conf/log4j.properties:
Number of Maps = 16
Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
11/09/14 21:12:02 INFO mapreduce.Job: map 0% reduce 0%
11/09/14 21:12:09 INFO mapreduce.Job: map 25% reduce 0%
11/09/14 21:12:11 INFO mapreduce.Job: map 56% reduce 0%
11/09/14 21:12:12 INFO mapreduce.Job: map 100% reduce 0%
11/09/14 21:12:12 INFO mapreduce.Job: map 100% reduce 100%
11/09/14 21:12:12 INFO mapreduce.Job: Job job_1381790835497_0003 completed successfully
11/09/14 21:12:19 INFO mapreduce.Job: Counters: 44
File System Counters
FILE: Number of bytes read=358
FILE: Number of bytes written=1365080
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4214
HDFS: Number of bytes written=215
HDFS: Number of read operations=67
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=16
Launched reduce tasks=1
Data-local map tasks=14
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=184421
Total time spent by all reduces in occupied slots (ms)=8542
Map-Reduce Framework
Map input records=16
Map output records=32
Map output bytes=288
Map output materialized bytes=448
Input split bytes=2326
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=448
Reduce input records=32
Reduce output records=0
Spilled Records=64
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=195
CPU time spent (ms)=7740
Physical memory (bytes) snapshot=6143396896
Virtual memory (bytes) snapshot=23142254400
Total committed heap usage (bytes)=43340769024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1848
File Output Format Counters
Bytes Written=98
Job Finished in 23.144 seconds
Estimated value of Pi is 3.14127500000000000000
You can compare the example that runs over Hadoop 1.x and the one that runs over YARN. You can hardly differentiate by looking at the logs, but you can clearly identify the difference in performance. YARN has backward-compatibility support with MapReduce 1.x, without any code change.