- Goal : calculate the average of occurrences and sort them by multiple reducers
- Contains 2 jobs
- Job 1: Calculate Average
- customized Partitioner, data type
- Job 2: Sort the average outputs from Job1
- customized Partitioner, data type, key class
- Job 1: Calculate Average
- The input contains many word-value pairs
- Speedup by Combiner
- Word only start by [a-z]
- TextInputFormat
- Use KeyValueTextInputFormat
- Extended version of TextInputFormat
- Split the record with a fixed delimiter (Default: “\t”)
- <Text, Text> pair
- Components
- Sort
- Sort by customized key class
- Implement WritableComparable
- Override compare function
- Mapper output : <SortPair, NullWritable>
- Total order sort
- Assign ranges for each reducer in Paritioner
- Components
- Mapper
- Partitioner
- Reducer
- Customized Key Class
- SortPair <word, average>
- implements WritableComparable
- function comapreTo: return negative value for ascending order
- Execution (in Average_Sort/)
-
make clean
make
sh execute.sh {iter_num, defualt=3}
-