Monday 4 January 2016

Old and new MapReduce APIs

The new API (which is also known as Context Objects) was primarily designed to make the API easier to evolve in the future and is type incompatible with the old one.

The new API came into the picture from the 1.x release series. However, it was partially supported in this series. So, the old API is recommended for 1.x series:


Feature\Release 1.x 0.23
Old MapReduce API Yes Deprecated
New MapReduce API Partial Yes
MRv1 runtime (Classic) Yes No
MRv2 runtime (YARN) No Yes


The old and new API can be compared as follows:


Old API New API
The old API is in the org.apache.hadoop.mapred
package and is still present.
The new API is in the org.apache.hadoop.mapreduce
Package.
The old API used interfaces for Mapper and Reducer. The new API uses Abstract Classes for Mapper and
Reducer.
The old API used the JobConf, OutputCollector, and Reporter object to communicate with the MapReduce System. The new API uses the context object to communicate with the MapReduce system.
In the old API, job control was done through the JobClient. In the new API, job control is performed through the Job Class.
In the old API, job configuration was done with a JobConf Object In the new APO, job configuration is done through the Configuration class via some of the helper methods on Job.
In the old API, both the map and reduce outputs are named part-nnnnn . In the new API, the map outputs are named part-m-nnnnn and the reduce outputs are named part-r-nnnnn .
In the old API, the reduce() method passes values as a java.lang.Iterator . In the new API, the . method passes values as a
java.lang.Iterable .
The old API controls mappers by writing a MapRunnable, but no equivalent exists for reducers. The new API allows both mappers and reducers to control the execution flow by overriding the run() method.





No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...