1. 统计一篇英文文章中每个单词出现的次数,要求忽略大小写。
from pyspark import SparkContext

sc = SparkContext("local", "Word Count")

text_file = sc.textFile("path/to/article.txt")

counts = text_file.flatMap(lambda line: line.lower().split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)

counts.saveAsTextFile("path/to/output")
  1. 统计一组数字中的最大值和最小值。
from pyspark import SparkContext

sc = SparkContext("local", "Max and Min")

nums = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

max_num = nums.reduce(lambda a, b: a if a > b else b)
min_num = nums.reduce(lambda a, b: a if a < b else b)

print("Max: %d, Min: %d" % (max_num, min_num))
  1. 对一组字符串进行排序。
from pyspark import SparkContext

sc = SparkContext("local", "Sort Strings")

strings = sc.parallelize(["apple", "banana", "cherry", "date", "elderberry"])

sorted_strings = strings.sortBy(lambda x: x)

sorted_strings.saveAsTextFile("path/to/output")
``
出几道关于Spark RDD相关的几道程序设计题并给出答案

原文地址: https://www.cveoy.top/t/topic/fQTZ 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录