Summary -
In this topic, we described about the below sections -
Classes and methods are involved in the MapReduce programming operations. Below are the concepts in mapreduce api -
- Job context interface
- Job class
- Mapper class
- Reducer class
JobContext interface -
Job class is the main class to implement the JobContext interface. JobContext interface is the super-interface for all the classes. It defines different jobs in MapReduce.
It gives a read-only view of the job while tasks are running. The JobContext sub interfaces are -
Mapcontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >
- It defines the context given to the mapper.
Reducecontext< KEYIN, VALUEIN, KEYOUT, VALUEOUT >
- It defines the context passed to reducer.
Job class -
The job submitter's view of the Job. The Job class is the most important class of MapReduce API. The Job class allows the user to configure the job, submit it, control its execution and query the state.
The set methods work until the job is submitted, afterwards they will throw an IllegalStateException. In general, user creates the application, describes various facets of the job via Job, then submits the job and monitor its progress.
Below example shows how to submit a job -
// Create a new Job
Job job = Job.getInstance();
job.setJarByClass(MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
job.setInputPath(new Path("in"));
job.setOutputPath(new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
// Submit the job, then poll for progress
until the job is complete
job.waitForCompletion(true);
Constructors of Job class -
job()
job(Configuration conf)
job(Configuration conf, String jobname)
Some Methods of Job class -
- cleanupProgress(): Get the progress of the job's cleanup-tasks
- getCounters(): Gets the counters for this job
- getFinishTime(): Get finish time of the job
- getJobFile(): Get the path of the submitted job configuration
- getJobName(): job name specified by the user
- getJobState(): Returns the job current state
- getPriority(): Get scheduling info of the job
- getStartTime(): Get start time of the job
- isComplete(): Checks whether the job is finished or not
- isSuccessful(): Check if the job completed successfully
- setInputFormatClass(): Sets the input format for the job
- setJobName(String name): Sets the job name specified by the user
- setOutputFormatClass(): Sets the output format for the job
- setMapperClass(Class): Sets the mapper for the job
- setReducerClass(Class): Sets the reducer for the job
- setPartitionerClass(Class): Sets the partitioner for the job
- setCombinerClass(Class): Sets the combiner for the job.
Mapper class (Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>): -
Maps input key or value to a group of intermediate key or value pairs. Maps are individual task for translate input records to intermediate records. The transformed intermediate records need not be of the same type as the input records.
Given input pair may map to zero or many output pairs.
Constructors of Mapper class -
Mapper()
Methods of Mapper class -
- setup(org.apache.hadoop.mapreduce.Mapper.Context context) - Called once at the beginning of the task.
- map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context) - Called once for each key/value pair in the InputSplit.
- run(org.apache.hadoop.mapreduce.Mapper.Context context) - Expert users can override this method for more control over the execution of the Mapper.
- cleanup(org.apache.hadoop.mapreduce.Mapper.Context context) - Called once at the end of the task.
Reducer class (Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>): -
It defines the reducer job in mapreduce. Reduces is a setof intermediate values, that share a key to a smaller set of values. Reducer implementations can access the Configuration for the job via the JobContext.getConfiguration() method.
Three phases of reducers are -
- Shuffle - Reducer copies sorted output from every mapper using http across the network.
- Sort - When the outputs are fetched, both the shuffle and sort phases occurs simultaneously.
- Reduce - Syntax is - reduce (object, Iterable, Context).
Constructors of Reducer class -
Reducer()
Methods of Reducer class -
- setup(org.apache.hadoop.mapreduce.Reducer.Context context) - Called once at the start of the task.
- reduce(KEYIN key, Iterable<VALUEIN> values, org.apache.hadoop.mapreduce.Reducer.Context context) - This method is called once for each key.
- run(org.apache.hadoop.mapreduce.Reducer.Context context) - Advanced application writers can use to control how the reduce task works.
- cleanup(org.apache.hadoop.mapreduce.Reducer.Context context) - Called once at the end of the task.