from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('UnionAndDistinct').getOrCreate()

line1 = spark.read.text('A.txt').rdd line2 = spark.read.text('B.txt').rdd

lines = line1.union(line2) distinct_lines = lines.distinct() result = distinct_lines.sortBy(lambda x: x) result = result.coalesce(1) result.saveAsTextFile('result')

Python Spark: Union, Distinct, Sort, and Save to File

原文地址: https://www.cveoy.top/t/topic/kPw 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录