Python Pandas数据分析实战：销售数据合并、分析与可视化

本教程将演示如何使用Python的Pandas库对销售数据进行合并、分析和可视化。

步骤1：加载所需文件

import pandas as pd

# 加载数据文件
attachment_1 = pd.read_excel(r'C:\Users\86136\Desktop\C题(1)\附件1.xlsx')
attachment_2 = pd.read_excel(r'C:\Users\86136\Desktop\C题(1)\附件2.xlsx')
attachment_3 = pd.read_excel(r'C:\Users\86136\Desktop\C题(1)\附件3.xlsx')
attachment_4 = pd.read_excel(r'C:\Users\86136\Desktop\C题(1)\附件4.xlsx')

# 显示每个数据集的前几行，进行初步检查
attachment_1.head(), attachment_2.head(), attachment_3.head(), attachment_4.head()

代码说明：

导入pandas库，用于数据处理和分析。
使用pd.read_excel()函数加载四个Excel文件到Pandas DataFrame中。
使用.head()方法显示每个DataFrame的前几行数据，以便初步了解数据结构和内容。

步骤2：合并数据集

# 合并数据集，根据'单品编码'合并
merged_data = pd.merge(attachment_2, attachment_1, on='单品编码', how='left')

# 保存合并后的数据集到桌面
merged_data.to_excel(r'C:\Users\86136\Desktop\merged_dataset.xlsx', index=False)

代码说明：

使用pd.merge()函数将attachment_1和attachment_2两个DataFrame合并，连接关键字为'单品编码'，使用左连接方式。
使用.to_excel()方法将合并后的DataFrame保存到名为'merged_dataset.xlsx'的Excel文件中，并禁用索引列。

步骤3：分析销售量

# 按品类和商品分类销售量
category_sales = merged_data.groupby('分类名称')['销量(千克)'].sum().sort_values(ascending=False)
product_sales = merged_data.groupby('单品名称')['销量(千克)'].sum().sort_values(ascending=False)

代码说明：

使用groupby()方法对merged_data DataFrame进行分组，分别按照'分类名称'和'单品名称'进行分组。
使用['销量(千克)']选择'销量(千克)'列进行计算。
使用sum()方法计算每个分组的销量总和。
使用sort_values(ascending=False)方法按照销量从高到低对结果进行排序。

步骤4：可视化销售量分布

import matplotlib.pyplot as plt

# 绘制销售量分布图
fig, ax = plt.subplots(2, 1, figsize=(12, 12))

# 品类销售量分布图
category_sales.plot(kind='bar', ax=ax[0], color='teal')
ax[0].set_title('Total Sales Volume by Category')
ax[0].set_ylabel('Sales Volume (kg)')
ax[0].set_xlabel('Category Name')

# 单品销售量分布图（前10个）
product_sales.head(10).plot(kind='bar', ax=ax[1], color='coral')
ax[1].set_title('Total Sales Volume by Product (Top 10)')
ax[1].set_ylabel('Sales Volume (kg)')
ax[1].set_xlabel('Product Name')

plt.tight_layout()
plt.show()

代码说明：

导入matplotlib.pyplot库，用于数据可视化。
使用plt.subplots()函数创建一个包含两个子图的图形窗口。
使用.plot(kind='bar')方法分别绘制品类销售量和单品销售量的柱状图。
设置图表标题、坐标轴标签等格式。
使用plt.tight_layout()调整子图布局。
使用plt.show()显示图形窗口。

步骤5：分析销售趋势

# 根据销售日期和品类分组，计算销售量
category_date_sales = merged_data.groupby(['销售日期', '分类名称'])['销量(千克)'].sum().reset_index()

# 绘制按时间分类的销售量趋势图
plt.figure(figsize=(16, 8))
for category in category_date_sales['分类名称'].unique():
    subset = category_date_sales[category_date_sales['分类名称'] == category]
    plt.plot(subset['销售日期'], subset['销量(千克)'], label=category)

plt.title('Sales Volume Trend by Category')
plt.xlabel('Date')
plt.ylabel('Sales Volume (kg)')
plt.legend()
plt.grid(True)
plt.show()

代码说明：

使用groupby()方法对merged_data DataFrame进行分组，按照'销售日期'和'分类名称'进行分组。
使用['销量(千克)']选择'销量(千克)'列进行计算。
使用sum()方法计算每个分组的销量总和。
使用reset_index()方法将分组后的索引转换为列。
使用循环遍历每个品类，并使用plt.plot()方法绘制每个品类的销售趋势折线图。
设置图表标题、坐标轴标签、图例和网格线等格式。

总结

本教程演示了如何使用Python Pandas库对销售数据进行合并、分析和可视化。你可以根据自己的数据和需求修改代码，进行更深入的分析