1. 使用Spark Streaming读取Kafka中的数据,并统计每个单词的出现次数。

Python答案:

from pyspark.streaming.kafka import KafkaUtils
from pyspark.streaming import StreamingContext
from pyspark import SparkConf

conf = SparkConf().setAppName("KafkaWordCount")
ssc = StreamingContext(conf, 5)

kafkaParams = {"metadata.broker.list": "localhost:9092"}
topics = ["test_topic"]

kafkaStream = KafkaUtils.createDirectStream(ssc, topics, kafkaParams)

words = kafkaStream.flatMap(lambda x: x[1].split(" "))
wordCounts = words.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x + y)

wordCounts.pprint()

ssc.start()
ssc.awaitTermination()
  1. 使用Spark Streaming读取HDFS中的文件,并统计每个单词的出现次数。

Python答案:

from pyspark.streaming import StreamingContext
from pyspark import SparkConf

conf = SparkConf().setAppName("HDFSWordCount")
ssc = StreamingContext(conf, 5)

lines = ssc.textFileStream("hdfs://localhost:9000/input")

words = lines.flatMap(lambda line: line.split(" "))
wordCounts = words.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x + y)

wordCounts.pprint()

ssc.start()
ssc.awaitTermination()
``
输出两道 Spark Streaming相关的程序题并给出Python答案

原文地址: https://www.cveoy.top/t/topic/fIPb 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录