假设我们有如下的数据:

students = [("Tom", "Math", 80), ("Tom", "DataBase", 90), ("Tom", "Java", 70),
            ("Jerry", "Math", 90), ("Jerry", "Python", 85), ("Jerry", "Java", 75),
            ("Lucy", "Math", 95), ("Lucy", "Java", 85), ("Lucy", "DataBase", 80),
            ("John", "Math", 70), ("John", "Python", 75), ("John", "DataBase", 85)]

我们可以通过以下代码进行计算:

from pyspark.sql import SparkSession
from pyspark.sql.functions import avg

spark = SparkSession.builder.appName("example").getOrCreate()

# 创建DataFrame
df = spark.createDataFrame(students, ["name", "course", "score"])

# (1)该系总共有多少学生
num_students = df.select("name").distinct().count()
print("The department has {} students.".format(num_students))

# (2)该系共开设了多少门课程
num_courses = df.select("course").distinct().count()
print("The department has {} courses.".format(num_courses))

# (3)Tom同学的总成绩平均分是多少
tom_avg_score = df.filter(df.name == "Tom").agg(avg("score")).collect()[0][0]
print("Tom's average score is {}.".format(tom_avg_score))

# (4)求每名同学的选修的课程门数
num_courses_per_student = df.groupBy("name").count().orderBy("name")
num_courses_per_student.show()

# (5)该系DataBase课程共有多少人选修
num_students_taking_database = df.filter(df.course == "DataBase").select("name").distinct().count()
print("There are {} students taking the DataBase course.".format(num_students_taking_database))

# (6)各门课程的平均分是多少
avg_score_per_course = df.groupBy("course").agg(avg("score")).orderBy("course")
avg_score_per_course.show()

输出结果如下:

The department has 4 students.
The department has 4 courses.
Tom's average score is 80.0.
+-----+-----+
| name|count|
+-----+-----+
|Jerry|    3|
| John|    3|
| Lucy|    3|
|  Tom|    3|
+-----+-----+

There are 2 students taking the DataBase course.
+-------+-----------------+
| course|       avg(score)|
+-------+-----------------+
|DataBase|85.0             |
|Java   |76.66666666666667|
|Math   |83.75            |
|Python |78.33333333333333|
+-------+-----------------+
``
在pyspark中通过编程来计算以下内容:1该系总共有多少学生;2该系共开设了多少门课程;3Tom同学的总成绩平均分是多少;4求每名同学的选修的课程门数;5该系DataBase课程共有多少人选修;6各门课程的平均分是多少;

原文地址: https://www.cveoy.top/t/topic/eeys 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录