Python Spark 统计文本文件数据频率
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate() lines = spark.read.text('data01.txt').rdd res = lines.map(lambda x: x.value.split(',')).map(lambda x: (x[0], 1)) result = res.reduceByKey(lambda x, y: x + y) result.collect()
原文地址: https://www.cveoy.top/t/topic/kKr 著作权归作者所有。请勿转载和采集!