编写MapReduce程序

MapReduce程序是一种并行计算框架，可以用于处理大量数据。编写MapReduce程序需要以下步骤：

定义Mapper函数：Mapper函数将输入数据转换为键值对，并输出到Reducer函数。Mapper函数需要实现map方法，接收输入键值对参数，处理后输出键值对。
定义Reducer函数：Reducer函数将相同键的值聚合在一起，并输出到输出文件。Reducer函数需要实现reduce方法，接收键值对列表参数，处理后输出键值对。
定义Driver程序：Driver程序是MapReduce程序的主入口，负责配置MapReduce作业，并提交到集群上执行。Driver程序需要实现run方法，配置Mapper函数、Reducer函数、输入路径、输出路径等参数，并提交作业到集群上执行。
执行MapReduce程序：将编写好的MapReduce程序打包成jar包，上传到集群上，执行hadoop jar命令启动程序。程序会自动读取输入文件，执行Mapper和Reducer函数，并将结果输出到指定的输出文件。

示例代码：

Mapper函数

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }

Reducer函数

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }

Driver程序

public class WordCount {

public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setCombinerClass(WordCountReducer.class); job.setReducerClass(WordCountReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }

执行MapReduce程序

hadoop jar wordcount.jar WordCount /input /outpu