This guide explains how to convert wide data into long format using the Python Pandas library, focusing on a dataset containing agricultural information for Afghanistan. The goal is to transform the data to have three columns: 'Production', 'Import Quantity', and 'Use per capita'.

Understanding the Data

The original data is organized in a wide format, with years as separate columns and different metrics for each year. Here's a sample of the dataset:

Area	Element	Y1961	Y1962	Y1963	Y1964	Y1965	Y1966
Afghanistan	Production					
Afghanistan	Import Quantity	1000	1000	1000	1000	1000	1000
Afghanistan	Export Quantity					
Afghanistan	Agricultural Use	1000	1000	1000	1000	1000	1000
Afghanistan	Use per area of cropland	0.13	0.13	0.13	0.13	0.13	0.13
Afghanistan	Use per capita	0.11	0.11	0.11	0.11	0.1	0.1
Afghanistan	Use per value of agricultural production	0.39	0.38	0.38	0.36	0.34	0.33
Afghanistan	Production					
Afghanistan	Import Quantity	100	100	100	100	100	500
Afghanistan	Export Quantity					
Afghanistan	Agricultural Use	100	100	100	100	100	500
Afghanistan	Use per area of cropland	0.01	0.01	0.01	0.01	0.01	0.06
Afghanistan	Use per capita	0.01	0.01	0.01	0.01	0.01	0.05
Afghanistan	Use per value of agricultural production	0.04	0.04	0.04	0.04	0.03	0.17

Python Code to Transform the Data

import pandas as pd

# Create a DataFrame from the data
data = {'Area': ['Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afghanistan'],
        'Element': ['Production', 'Import Quantity', 'Export Quantity', 'Agricultural Use', 'Use per area of cropland', 'Use per capita'],
        'Y1961': [None, 1000, None, 1000, 0.13, 0.11],
        'Y1962': [None, 1000, None, 1000, 0.13, 0.11],
        'Y1963': [None, 1000, None, 1000, 0.13, 0.11],
        'Y1964': [None, 1000, None, 1000, 0.13, 0.11],
        'Y1965': [None, 1000, None, 1000, 0.13, 0.1],
        'Y1966': [None, 1000, None, 1000, 0.13, 0.1]}
df = pd.DataFrame(data)

# Convert to long format using melt
df_long = df.melt(id_vars=['Area', 'Element'], var_name='Year', value_name='Value')

# Filter for desired elements
df_long = df_long[df_long['Element'].isin(['Production', 'Import Quantity', 'Use per capita'])]

# Print the transformed DataFrame
print(df_long)

Output:

            Area           Element   Year   Value
0   Afghanistan       Production  Y1961    None
1   Afghanistan  Import Quantity  Y1961  1000.0
5   Afghanistan   Use per capita  Y1961    0.11
6   Afghanistan       Production  Y1962    None
7   Afghanistan  Import Quantity  Y1962  1000.0
11  Afghanistan   Use per capita  Y1962    0.11
...

This long format makes it easier to analyze the data, as each row now represents a specific year, element, and value for Afghanistan. You can easily filter, group, and aggregate data based on these new columns.

Key Points:

  • Pandas melt() function: This function is the core of converting wide data to long format. It stacks the columns specified in id_vars and creates new columns for the variable names ('Year') and values ('Value').
  • Filtering with isin(): We use this method to select only the rows containing the desired 'Element' values.

Further Steps:

You can further analyze the transformed data by:

  • Calculating averages over time: Use groupby() to group by 'Element' and then apply aggregation functions like mean().
  • Visualizing the data: Create line graphs or bar charts to visualize trends in agricultural production, imports, and per capita use over the years.

This approach provides a foundation for conducting further analysis and gaining insights from the provided agricultural data for Afghanistan.

Afghanistan Data: Transforming Wide Data to Long Format with Python

原文地址: https://www.cveoy.top/t/topic/fuu7 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录