Python Spark: Union, Distinct, Sort, and Save to File
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('UnionAndDistinct').getOrCreate()
line1 = spark.read.text('A.txt').rdd line2 = spark.read.text('B.txt').rdd
lines = line1.union(line2) distinct_lines = lines.distinct() result = distinct_lines.sortBy(lambda x: x) result = result.coalesce(1) result.saveAsTextFile('result')
原文地址: https://www.cveoy.top/t/topic/kPw 著作权归作者所有。请勿转载和采集!