Monday 4 January 2016

Source compatibility of org.apache.hadoop.mapred APIs

Source incompatibility means that some code changes are required for compilation.

Source incompatibility is orthogonal to binary compatibility.

Binaries for an application that is binary compatible but not source compatible will continue to run fine on the new framework. However, code changes are required to regenerate these binaries.

Apache Hadoop 2.x does not ensure complete binary compatibility with the applications that use  org.apache.hadoop.mapreduce  APIs, as these APIs have evolved a lot since MRv1. However, it ensures source compatibility for  org.apache.hadoop.mapreduce  APIs that break binary compatibility. In other words, you should recompile the applications that use MapReduce APIs against MRv2 JARs.

Existing applications that use MapReduce APIs are source compatible and can run on YARN with no changes, recompilation, and/or minor updates.

If an MRv1 MapReduce-based application fails to run on YARN, you are requested to investigate its source code and check whether MapReduce APIs are referred to or not. If they are referred to, you have to recompile the application against the MRv2 JARs that are shipped with Hadoop 2.

Old and new MapReduce APIs

The new API (which is also known as Context Objects) was primarily designed to make the API easier to evolve in the future and is type incompatible with the old one.

The new API came into the picture from the 1.x release series. However, it was partially supported in this series. So, the old API is recommended for 1.x series:


Feature\Release 1.x 0.23
Old MapReduce API Yes Deprecated
New MapReduce API Partial Yes
MRv1 runtime (Classic) Yes No
MRv2 runtime (YARN) No Yes


The old and new API can be compared as follows:


Old API New API
The old API is in the org.apache.hadoop.mapred
package and is still present.
The new API is in the org.apache.hadoop.mapreduce
Package.
The old API used interfaces for Mapper and Reducer. The new API uses Abstract Classes for Mapper and
Reducer.
The old API used the JobConf, OutputCollector, and Reporter object to communicate with the MapReduce System. The new API uses the context object to communicate with the MapReduce system.
In the old API, job control was done through the JobClient. In the new API, job control is performed through the Job Class.
In the old API, job configuration was done with a JobConf Object In the new APO, job configuration is done through the Configuration class via some of the helper methods on Job.
In the old API, both the map and reduce outputs are named part-nnnnn . In the new API, the map outputs are named part-m-nnnnn and the reduce outputs are named part-r-nnnnn .
In the old API, the reduce() method passes values as a java.lang.Iterator . In the new API, the . method passes values as a
java.lang.Iterable .
The old API controls mappers by writing a MapRunnable, but no equivalent exists for reducers. The new API allows both mappers and reducers to control the execution flow by overriding the run() method.





Related Posts Plugin for WordPress, Blogger...