Pandas数据清洗入门实践：Series和DataFrame操作

一. 实验目的

(1) 掌握Series和DataFrame的创建； (2) 熟悉pandas数据清洗和数据分析的常用操作； (3) 掌握使用matplotlib库画图的基本方法。

二. 实验平台

(1) 操作系统：Windows系统； (2) Python版本：3.8.7

三. 实验步骤

1. 基础练习

(1) 根据列表['Python','C','Scala','Java','GO','Scala','SQL','PHP','Python']创建一个变量名为language的Series；

import pandas as pd

language = pd.Series(['Python','C','Scala','Java','GO','Scala','SQL','PHP','Python'])
print(language)

输出结果：

0    Python
1         C
2     Scala
3      Java
4        GO
5     Scala
6       SQL
7       PHP
8    Python
dtype: object

(2) 创建一个由随机整型组成的Series，要求长度与language相同，变量名为score；

import random

score = pd.Series([random.randint(0,100) for i in range(len(language))])
print(score)

输出结果：

0    89
1    93
2    30
3    63
4    77
5    71
6    16
7     9
8    96
dtype: int64

(3) 根据language和score创建一个DataFrame；

df = pd.DataFrame({'language':language, 'score':score})
print(df)

输出结果：

  language  score
0   Python     89
1        C     93
2    Scala     30
3     Java     63
4       GO     77
5    Scala     71
6      SQL     16
7      PHP      9
8   Python     96

(4) 输出该DataFrame的前4行数据；

print(df.head(4))

输出结果：

  language  score
0   Python     89
1        C     93
2    Scala     30
3     Java     63

(5) 输出该DataFrame中language字段为Python的行；

print(df[df['language']=='Python'])

输出结果：

  language  score
0   Python     89
8   Python     96

(6) 将DataFrame按照score字段的值进行升序排序；

print(df.sort_values(by='score'))

输出结果：

  language  score
7      PHP      9
6      SQL     16
2    Scala     30
3     Java     63
5    Scala     71
4       GO     77
0   Python     89
1        C     93
8   Python     96

(7) 统计language字段中每种编程语言出现的次数。

print(df['language'].value_counts())

输出结果：

Scala     2
Python    2
C         1
SQL       1
GO        1
PHP       1
Java      1
Name: language, dtype: int64