Pandas Groupby: Efficient Data Grouping and Analysis in Python
Pandas groupby is a function that allows you to group data in a DataFrame based on one or more columns. It's similar to the SQL GROUP BY statement, offering a powerful way to perform data analysis and manipulation.
Syntax and Parameters
The syntax for using groupby in pandas is:
df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)
Here are some commonly used parameters:
by: Specifies the column(s) to group by. Can be a column name, list of names, or dictionary with column names as keys and group names as values.axis: Groups along rows (axis=0) or columns (axis=1).level: Used for hierarchical indexes, specifying the level(s) to group by.as_index: Determines whether grouped columns become the index of the resulting DataFrame.sort: Sorts the resulting groups by group keys.group_keys: Includes group keys in the resulting DataFrame.squeeze: Returns a Series instead of a DataFrame if the grouping produces a single group.
Performing Operations on Grouped Data
Once you've grouped your data, you can perform various operations, including:
- Aggregation: Calculate summary statistics (e.g., sum, mean, count) for each group.
- Transformation: Apply custom functions to each group.
- Filtering: Select specific groups based on criteria.
Example: Grouping by 'Name' and Calculating Mean Salary
import pandas as pd
# Create a DataFrame
data = {'Name': ['John', 'Jane', 'John', 'Jane', 'John'],
'Age': [25, 30, 35, 40, 45],
'Salary': [50000, 60000, 70000, 80000, 90000]}
df = pd.DataFrame(data)
# Group by 'Name' and calculate mean salary
grouped = df.groupby('Name')['Salary'].mean()
print(grouped)
Output:
Name
Jane 70000
John 70000
Name: Salary, dtype: int64
This example shows how to group by the 'Name' column and calculate the mean salary for each unique name, demonstrating the power and flexibility of pandas groupby.
原文地址: https://www.cveoy.top/t/topic/fOoO 著作权归作者所有。请勿转载和采集!