MapReduce 程序编写指南：入门教程及示例代码

MapReduce 程序是一种并行计算框架，可以用于处理大量数据。编写 MapReduce 程序需要以下步骤：

定义 Mapper 函数：Mapper 函数将输入数据转换为键值对，并输出到 Reducer 函数。Mapper 函数需要实现 map 方法，接收输入键值对参数，处理后输出键值对。
定义 Reducer 函数：Reducer 函数将相同键的值聚合在一起，并输出到输出文件。Reducer 函数需要实现 reduce 方法，接收键值对列表参数，处理后输出键值对。
定义 Driver 程序：Driver 程序是 MapReduce 程序的主入口，负责配置 MapReduce 作业，并提交到集群上执行。Driver 程序需要实现 run 方法，配置 Mapper 函数、Reducer 函数、输入路径、输出路径等参数，并提交作业到集群上执行。
执行 MapReduce 程序：将编写好的 MapReduce 程序打包成 jar 包，上传到集群上，执行 hadoop jar 命令启动程序。程序会自动读取输入文件，执行 Mapper 和 Reducer 函数，并将结果输出到指定的输出文件。

示例代码

Mapper 函数

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();

  public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
      word.set(tokenizer.nextToken());
      context.write(word, one);
    }  
  }
}

Reducer 函数

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

  public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    int sum = 0;
    for (IntWritable val : values) {
      sum += val.get();
    }
    context.write(key, new IntWritable(sum));
  }
}

Driver 程序

public class WordCount {

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(WordCountMapper.class);
    job.setCombinerClass(WordCountReducer.class);
    job.setReducerClass(WordCountReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

执行 MapReduce 程序

hadoop jar wordcount.jar WordCount /input /output

通过以上步骤和示例代码，您可以快速上手 MapReduce 程序开发，并利用 MapReduce 框架高效地处理海量数据。