Afghanistan Data: Transforming Wide Data to Long Format with Python
This guide explains how to convert wide data into long format using the Python Pandas library, focusing on a dataset containing agricultural information for Afghanistan. The goal is to transform the data to have three columns: 'Production', 'Import Quantity', and 'Use per capita'.
Understanding the Data
The original data is organized in a wide format, with years as separate columns and different metrics for each year. Here's a sample of the dataset:
Area Element Y1961 Y1962 Y1963 Y1964 Y1965 Y1966
Afghanistan Production
Afghanistan Import Quantity 1000 1000 1000 1000 1000 1000
Afghanistan Export Quantity
Afghanistan Agricultural Use 1000 1000 1000 1000 1000 1000
Afghanistan Use per area of cropland 0.13 0.13 0.13 0.13 0.13 0.13
Afghanistan Use per capita 0.11 0.11 0.11 0.11 0.1 0.1
Afghanistan Use per value of agricultural production 0.39 0.38 0.38 0.36 0.34 0.33
Afghanistan Production
Afghanistan Import Quantity 100 100 100 100 100 500
Afghanistan Export Quantity
Afghanistan Agricultural Use 100 100 100 100 100 500
Afghanistan Use per area of cropland 0.01 0.01 0.01 0.01 0.01 0.06
Afghanistan Use per capita 0.01 0.01 0.01 0.01 0.01 0.05
Afghanistan Use per value of agricultural production 0.04 0.04 0.04 0.04 0.03 0.17
Python Code to Transform the Data
import pandas as pd
# Create a DataFrame from the data
data = {'Area': ['Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan'],
'Element': ['Production', 'Import Quantity', 'Export Quantity', 'Agricultural Use', 'Use per area of cropland', 'Use per capita'],
'Y1961': [None, 1000, None, 1000, 0.13, 0.11],
'Y1962': [None, 1000, None, 1000, 0.13, 0.11],
'Y1963': [None, 1000, None, 1000, 0.13, 0.11],
'Y1964': [None, 1000, None, 1000, 0.13, 0.11],
'Y1965': [None, 1000, None, 1000, 0.13, 0.1],
'Y1966': [None, 1000, None, 1000, 0.13, 0.1]}
df = pd.DataFrame(data)
# Convert to long format using melt
df_long = df.melt(id_vars=['Area', 'Element'], var_name='Year', value_name='Value')
# Filter for desired elements
df_long = df_long[df_long['Element'].isin(['Production', 'Import Quantity', 'Use per capita'])]
# Print the transformed DataFrame
print(df_long)
Output:
Area Element Year Value
0 Afghanistan Production Y1961 None
1 Afghanistan Import Quantity Y1961 1000.0
5 Afghanistan Use per capita Y1961 0.11
6 Afghanistan Production Y1962 None
7 Afghanistan Import Quantity Y1962 1000.0
11 Afghanistan Use per capita Y1962 0.11
...
This long format makes it easier to analyze the data, as each row now represents a specific year, element, and value for Afghanistan. You can easily filter, group, and aggregate data based on these new columns.
Key Points:
- Pandas
melt()function: This function is the core of converting wide data to long format. It stacks the columns specified inid_varsand creates new columns for the variable names ('Year') and values ('Value'). - Filtering with
isin(): We use this method to select only the rows containing the desired 'Element' values.
Further Steps:
You can further analyze the transformed data by:
- Calculating averages over time: Use
groupby()to group by 'Element' and then apply aggregation functions likemean(). - Visualizing the data: Create line graphs or bar charts to visualize trends in agricultural production, imports, and per capita use over the years.
This approach provides a foundation for conducting further analysis and gaining insights from the provided agricultural data for Afghanistan.
原文地址: https://www.cveoy.top/t/topic/fuu7 著作权归作者所有。请勿转载和采集!