Python Spark: 计算文本文件数据平均值
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('AverageScore').getOrCreate() lines = spark.read.text('data01.txt').rdd.map(lambda x: x[0]) score = lines.map(lambda x: int(x.split()[2])) num = score.count() total_score = score.reduce(lambda x, y: x + y) avg = total_score / num print(avg)
原文地址: https://www.cveoy.top/t/topic/kH4 著作权归作者所有。请勿转载和采集!