K-means算法中的Map类：中心点归属判别

该代码是K-means算法中的Map类，用于将元素进行中心点归属判别并输出中心点序号作为key。

代码结构：

package com.huiluczP.cluster;

import com.huiluczP.util.CalUtil;
import com.huiluczP.util.DataUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import java.util.ArrayList;

// map-reduce体系中map类
// 主要思路是读取中心点文件，将元素进行中心点归属判别，输出的key设置为中心点序号方便后续计算新中心点
public class KmeansMapper extends Mapper<Object, Text, Text, Text> {

    private ArrayList<ArrayList<Double>> centers = null;

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        // 读一下centers
        // 地址从配置中拿好了
        Configuration configuration = context.getConfiguration();
        String centerPath = configuration.get("cluster.center_path");
        centers = DataUtil.readCenter(centerPath);
    }

    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        ArrayList<Double> element = DataUtil.splitStringIntoArray(value.toString());
        // 选择最近中心点，将其作为key
        int index = CalUtil.selectNearestCenter(element, centers);
        context.write(new Text(Integer.toString(index)), value);
    }
}

代码功能描述：

**setup()方法：**读取中心点文件并将其存储在centers变量中。
map()方法：
- 将输入的元素转换成ArrayList类型。
- 调用CalUtil中的selectNearestCenter()方法选择最近的中心点，将其作为key。
- 将原始元素作为value输出。

输出结果：

最终，输出的结果会被传递给Reducer类进行新中心点的计算。

代码中的关键点：

CalUtil.selectNearestCenter(element, centers) 方法用于计算元素与各个中心点之间的距离，并选择最近的中心点。
输出的key为中心点序号，方便后续Reducer类进行聚合计算。

总结：

该代码是K-means算法中的Map类，用于将元素进行中心点归属判别并输出中心点序号作为key，为后续Reducer类计算新中心点提供基础数据。