出几道关于Spark RDD相关的几道程序设计题并给出答案
- 统计一篇英文文章中每个单词出现的次数,要求忽略大小写。
from pyspark import SparkContext
sc = SparkContext("local", "Word Count")
text_file = sc.textFile("path/to/article.txt")
counts = text_file.flatMap(lambda line: line.lower().split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("path/to/output")
- 统计一组数字中的最大值和最小值。
from pyspark import SparkContext
sc = SparkContext("local", "Max and Min")
nums = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
max_num = nums.reduce(lambda a, b: a if a > b else b)
min_num = nums.reduce(lambda a, b: a if a < b else b)
print("Max: %d, Min: %d" % (max_num, min_num))
- 对一组字符串进行排序。
from pyspark import SparkContext
sc = SparkContext("local", "Sort Strings")
strings = sc.parallelize(["apple", "banana", "cherry", "date", "elderberry"])
sorted_strings = strings.sortBy(lambda x: x)
sorted_strings.saveAsTextFile("path/to/output")
``
原文地址: https://www.cveoy.top/t/topic/fQTZ 著作权归作者所有。请勿转载和采集!