import tkinter as tkfrom tkinter import filedialog messageboximport pandas as pdimport numpy as npclass DataPreprocessingdef initselfselfcolumns = Noneselfdf = None # 创建主窗口 selfmaster = tkTk
缺少了一级缩进,需要在class定义处添加一个缩进。完整代码如下:
import tkinter as tk from tkinter import filedialog, messagebox import pandas as pd import numpy as np
class DataPreprocessing: def init(self): self.columns = None self.df = None
# 创建主窗口
self.master = tk.Tk()
self.master.title("数据预处理")
self.master.geometry("800x400")
# 设置标题
label_title = tk.Label(self.master, text="数据预处理", font=("Arial", 16), pady=30)
label_title.grid(row=0, column=0, columnspan=3)
# 设置打开文件按钮
button_open = tk.Button(self.master, text="打开文件", command=self.open_file)
button_open.grid(row=1, column=0, padx=50)
# 设置处理空值按钮
button_fillna = tk.Button(self.master, text="处理空值", state="disabled", command=self.fill_na)
button_fillna.grid(row=1, column=1, padx=50)
# 设置去重按钮
button_drop_duplicates = tk.Button(self.master, text="去重", state="disabled", command=self.drop_duplicates)
button_drop_duplicates.grid(row=1, column=2, padx=50)
# 设置处理异常值按钮
button_handle_outliers = tk.Button(self.master, text="处理异常值", state="disabled", command=self.handle_outliers)
button_handle_outliers.grid(row=1, column=3, padx=50)
# 设置输出缺失值数目的按钮
button_missing_value = tk.Button(self.master, text="缺失值数量", state="disabled", command=self.missing_value_count)
button_missing_value.grid(row=1, column=4, padx=50)
# 设置treeview
self.tree = tk.StringVar(value="")
treeview = tk.Listbox(self.master, listvariable=self.tree, height=10)
treeview.grid(row=2, column=0, columnspan=4, padx=10, pady=10)
self.treeview = treeview
# 设置保存文件按钮
button_save = tk.Button(self.master, text="保存文件", state="disabled", command=self.save_file)
button_save.grid(row=3, column=0, padx=50)
# 设置退出按钮
button_exit = tk.Button(self.master, text="退出", command=self.master.destroy)
button_exit.grid(row=3, column=1, padx=50)
# 打开文件
def open_file(self):
file_path = filedialog.askopenfilename(defaultextension=".csv",
filetypes=(("CSV files", "*.csv"), ("All Files", "*.*")))
if file_path:
try:
self.df = pd.read_csv(file_path)
self.columns = list(self.df.columns)
self.tree.set(self.columns)
messagebox.showinfo("提示", "数据已导入成功!", parent=self.master)
self.enable_buttons()
except Exception as e:
messagebox.showerror("错误", "打开文件失败!{}".format(e), parent=self.master)
# 处理空值
def fill_na(self):
self.df.fillna(method="ffill", inplace=True)
self.treeview.delete(0, tk.END)
self.tree.set(list(self.df.isnull().sum()))
messagebox.showinfo("提示", "空值已填充成功!", parent=self.master)
# 去重
def drop_duplicates(self):
self.df.drop_duplicates(inplace=True)
self.treeview.delete(0, tk.END)
self.tree.set(list(self.df.columns))
messagebox.showinfo("提示", "重复行已去除成功!", parent=self.master)
# 处理异常值
def handle_outliers(self):
if "Mileage" in self.df.columns:
mileage_unit = "km/l"
if mileage_unit == "mile/l":
self.df["Mileage"] = self.df["Mileage"].apply(lambda x: 1 / x)
elif mileage_unit == "km/kg":
self.df["Mileage"] = self.df["Mileage"].apply(lambda x: x * 0.425)
if "Engine" in self.df.columns:
engine_unit = "CC"
if engine_unit == "L":
self.df["Engine"] = self.df["Engine"] / 1000
if "Power" in self.df.columns:
power_unit = "bhp"
if power_unit == "kW":
self.df["Power"] = self.df["Power"] * 0.7457
self.treeview.delete(0, tk.END)
self.tree.set(list(self.df.columns))
messagebox.showinfo("提示", "异常值已处理成功!", parent=self.master)
# 计算缺失值数量
def missing_value_count(self):
missing_values = self.df.isnull().sum().sum()
messagebox.showinfo("提示", "缺失值数量为{}".format(missing_values), parent=self.master)
# 保存文件
def save_file(self):
file_path = filedialog.asksaveasfilename(defaultextension=".csv",
filetypes=(("CSV files", "*.csv"), ("All Files", "*.*")))
if file_path:
try:
self.df.to_csv(file_path, index=False)
messagebox.showinfo("提示", "数据已保存成功!", parent=self.master)
except Exception as e:
messagebox.showerror("错误", "保存文件失败!{}".format(e), parent=self.master)
# 启用处理空值、去重、处理异常值和保存按钮
def enable_buttons(self):
self.master.children["!button2"].config(state="normal")
self.master.children["!button3"].config(state="normal")
self.master.children["!button4"].config(state="normal")
self.master.children["!button5"].config(state="normal")
# 启动窗口
def run(self):
self.master.mainloop()
if name == "main": app = DataPreprocessing() app.run(
原文地址: https://www.cveoy.top/t/topic/gkl6 著作权归作者所有。请勿转载和采集!