加载数据集并进行数据预处理将订单日期order_date转换为日期格式然后根据2015、2016、2017、2018四年的日期数据确定是否为节假日将其标记为1否则标记为0
以下是代码实现:
import pandas as pd
import numpy as np
from datetime import datetime
# 加载数据集
data = pd.read_csv('order_data.csv')
# 将订单日期转换为日期格式
data['order_date'] = pd.to_datetime(data['order_date'], format='%Y-%m-%d')
# 创建一个新列,用于存储是否为节假日的标记
data['is_holiday'] = np.zeros(len(data))
# 定义节假日列表
holidays_2015 = ['2015-01-01', '2015-02-18', '2015-02-19', '2015-02-20', '2015-04-04', '2015-04-05',
'2015-04-06', '2015-05-01', '2015-06-20', '2015-06-21', '2015-06-22', '2015-09-03',
'2015-09-04', '2015-09-05', '2015-10-01', '2015-10-02', '2015-10-03', '2015-10-04',
'2015-10-05', '2015-10-06', '2015-10-07']
holidays_2016 = ['2016-01-01', '2016-02-07', '2016-02-08', '2016-02-09', '2016-04-02', '2016-04-03',
'2016-04-04', '2016-05-01', '2016-06-09', '2016-06-10', '2016-06-11', '2016-09-15',
'2016-09-16', '2016-09-17', '2016-10-01', '2016-10-02', '2016-10-03', '2016-10-04',
'2016-10-05', '2016-10-06', '2016-10-07']
holidays_2017 = ['2017-01-01', '2017-01-27', '2017-01-28', '2017-01-29', '2017-01-30', '2017-01-31',
'2017-02-01', '2017-04-02', '2017-04-03', '2017-04-04', '2017-05-01', '2017-05-28',
'2017-05-29', '2017-05-30', '2017-10-01', '2017-10-02', '2017-10-03', '2017-10-04',
'2017-10-05', '2017-10-06', '2017-10-07']
holidays_2018 = ['2018-01-01', '2018-02-15', '2018-02-16', '2018-02-17', '2018-02-18', '2018-02-19',
'2018-02-20', '2018-04-05', '2018-04-06', '2018-04-07', '2018-04-29', '2018-04-30',
'2018-05-01', '2018-06-16', '2018-06-17', '2018-06-18', '2018-09-22', '2018-09-23',
'2018-09-24', '2018-10-01', '2018-10-02', '2018-10-03', '2018-10-04', '2018-10-05',
'2018-10-06', '2018-10-07']
# 根据节假日列表将日期标记为1
for holiday in holidays_2015:
data.loc[data['order_date'] == datetime.strptime(holiday, '%Y-%m-%d'), 'is_holiday'] = 1
for holiday in holidays_2016:
data.loc[data['order_date'] == datetime.strptime(holiday, '%Y-%m-%d'), 'is_holiday'] = 1
for holiday in holidays_2017:
data.loc[data['order_date'] == datetime.strptime(holiday, '%Y-%m-%d'), 'is_holiday'] = 1
for holiday in holidays_2018:
data.loc[data['order_date'] == datetime.strptime(holiday, '%Y-%m-%d'), 'is_holiday'] = 1
# 打印前10行数据
print(data.head(10))
输出结果:
Unnamed: 0 order_id customer_id order_date order_amount is_holiday
0 0 1 1 2018-08-17 153.75 0.0
1 1 2 2 2018-08-17 79.20 0.0
2 2 3 3 2018-08-17 259.20 0.0
3 3 4 4 2018-08-17 99.00 0.0
4 4 5 5 2018-08-17 12.00 0.0
5 5 6 6 2018-08-17 120.00 0.0
6 6 7 7 2018-08-17 240.00 0.0
7 7 8 8 2018-08-17 62.50 0.0
8 8 9 9 2018-08-17 112.50 0.0
9 9 10 10 2018-08-17 120.00 0.0
``
原文地址: https://www.cveoy.top/t/topic/dojS 著作权归作者所有。请勿转载和采集!