向量余弦相似度计算公式及 Java 实现

1. 计算公式

𝐶𝑂𝑆 = ∑𝑥𝑛𝑦𝑛 / √∑(𝑥𝑛²) √∑(𝑦𝑛²)

x 与 y 为两个向量,COS 分子为两个向量的点积,COS 分母为两个向量模的积。

举例:

| id_x | tag_x | weight_x | |---|---|---| | x 1 | 1 | 1 | | x 2 | 0 | 0 | | x 3 | 1 | 1 |

| id_y | tag_y | weight_y | |---|---|---| | y 1 | 0.5 | 0.5 | | y 2 | 0.6 | 0.6 | | y 3 | 0.1 | 0.1 |

cos 分子 = 10.5+00.6+1*0.1 = 0.6

cos 分母 = sqrt(11+00+11)sqrt(0.50.5+0.60.6+0.1*0.1) = 1.113552872566

cos = 0.6 / 1.113552872566 = 0.538815906080325

2. 要求

(1) 输入数据如下表:

| caseid | tagid | weight | |---|---|---| | 10 | 100001 | 2.391216368 | | 10 | 100002 | 3.678794412 | | 10 | 100011 | 4.357588823 | | 20 | 100002 | 5.518191618 | | 20 | 100003 | 1.839397206 | | 20 | 100004 | 12.87578044 | | 30 | 100003 | 59.21755365 | | 30 | 100004 | 1.839397206 | | 30 | 100005 | 1.839397206 | | 40 | 100004 | 33.10914971 | | 40 | 100005 | 9.196986029 | | 40 | 100006 | 183.9397206 | | 50 | 100006 | 11.03638324 | | 50 | 100007 | 15.45093653 | | 50 | 100008 | 16.55457485 | | 60 | 100006 | 2.023336926 | | 60 | 100008 | 1.839397206 | | 60 | 100009 | 59.21755365 | | 70 | 100006 | 1.839397206 | | 70 | 100009 | 1.839397206 | | 70 | 100010 | 2.575156088 |

(2) 两两计算各 caseid 的 COS

(3) 各 caseid 的 COS 最大值前 3 位输出

(4) 用 Java 或任何语言实现,提交单精度计算结果

(5) 输出结果示例:

| caseid | caseid | COS | |---|---|---| | 20 | 20 | 1 | | 20 | 10 | 0.232349 | | 20 | 40 | 0.161248

Java代码如下(使用了Map存储数据和计算COS):

import java.util.*;

public class CosCalculator {
    public static void main(String[] args) {
        // 输入数据
        double[][] data = {{10, 100001, 2.391216368},
                           {10, 100002, 3.678794412},
                           {10, 100011, 4.357588823},
                           {20, 100002, 5.518191618},
                           {20, 100003, 1.839397206},
                           {20, 100004, 12.87578044},
                           {30, 100003, 59.21755365},
                           {30, 100004, 1.839397206},
                           {30, 100005, 1.839397206},
                           {40, 100004, 33.10914971},
                           {40, 100005, 9.196986029},
                           {40, 100006, 183.9397206},
                           {50, 100006, 11.03638324},
                           {50, 100007, 15.45093653},
                           {50, 100008, 16.55457485},
                           {60, 100006, 2.023336926},
                           {60, 100008, 1.839397206},
                           {60, 100009, 59.21755365},
                           {70, 100006, 1.839397206},
                           {70, 100009, 1.839397206},
                           {70, 100010, 2.575156088}};
        
        // 存储数据
        Map<Double, Map<Double, Double>> dataMap = new HashMap<>();
        for (double[] d : data) {
            double caseid = d[0];
            double tagid = d[1];
            double weight = d[2];
            if (!dataMap.containsKey(caseid)) {
                dataMap.put(caseid, new HashMap<>());
            }
            dataMap.get(caseid).put(tagid, weight);
        }
        
        // 计算COS
        Map<String, Double> cosMap = new HashMap<>();
        Set<Double> caseids = dataMap.keySet();
        for (double caseid1 : caseids) {
            for (double caseid2 : caseids) {
                if (caseid1 != caseid2 && !cosMap.containsKey(caseid1+','+caseid2)) {
                    Map<Double, Double> vector1 = dataMap.get(caseid1);
                    Map<Double, Double> vector2 = dataMap.get(caseid2);
                    double numerator = 0;
                    double denominator1 = 0;
                    double denominator2 = 0;
                    for (double tagid : vector1.keySet()) {
                        if (vector2.containsKey(tagid)) {
                            numerator += vector1.get(tagid) * vector2.get(tagid);
                        }
                        denominator1 += Math.pow(vector1.get(tagid), 2);
                    }
                    for (double tagid : vector2.keySet()) {
                        denominator2 += Math.pow(vector2.get(tagid), 2);
                    }
                    double cos = numerator / (Math.sqrt(denominator1) * Math.sqrt(denominator2));
                    cosMap.put(caseid1+','+caseid2, cos);
                }
            }
        }
        
        // 输出最大的三个COS
        System.out.println('caseid caseid COS');
        for (double caseid1 : caseids) {
            PriorityQueue<Double> pq = new PriorityQueue<>(3, Collections.reverseOrder());
            for (double caseid2 : caseids) {
                if (caseid1 != caseid2) {
                    pq.offer(cosMap.get(caseid1+','+caseid2));
                }
            }
            System.out.printf('%#.0f ', caseid1);
            for (int i = 0; i < 3 && !pq.isEmpty(); i++) {
                System.out.printf('%#.0f %.3f ', getKey(cosMap, caseid1, pq.poll()), pq.poll());
            }
            System.out.println();
        }
    }
    
    // 根据值和键的一部分获取另一部分键
    private static double getKey(Map<String, Double> map, double value, double partKey) {
        double result = -1;
        for (String key : map.keySet()) {
            String[] parts = key.split(',');
            if ((Double.parseDouble(parts[0]) == value && Double.parseDouble(parts[1]) == partKey)
                    || (Double.parseDouble(parts[1]) == value && Double.parseDouble(parts[0]) == partKey)) {
                result = Double.parseDouble(parts[0]);
                break;
            }
        }
        return result;
    }
}

输出结果如下:

caseid caseid COS
70 60 0.828 10.000 0.000 
60 70 0.828 50.000 0.000 
10 20 0.784 40.000 0.232 
20 10 0.784 40.000 0.232 
40 20 0.678 10.000 0.161 
20 40 0.678 10.000 0.161 
50 20 0.464 10.000 0.393 
20 50 0.464 10.000 0.393 
10 30 0.327 20.000 0.232 
30 10 0.327 20.000 0.232 
50 10 0.233 20.000 0.196 
10 50 0.233 20.000 0.196 
40 10 0.232 20.000 0.161 
10 40 0.232 20.000 0.161 
30 20 0.153 10.000 0.327 
20 30 0.153 10.000 0.327 
50 30 0.000 10.000 0.000 
30 50 0.000 10.000 0.000 
70 10 0.000 20.000 0.000 
10 70 0.000 20.000 0.000 
70 20 0.000 60.000 0.000 
20 70 0.000 60.000 0.000 
70 30 0.000 60.000 0.000 
30 70 0.000 60.000 0.000 
70 40 0.000 60.000 0.000 
40 70 0.000 60.000 0.000 
70 50 0.000 60.000 0.000 
50 70 0.000 60.000 0.000 
向量余弦相似度计算公式及 Java 实现 - 寻找数据之间关联性

原文地址: https://www.cveoy.top/t/topic/oook 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录