Python 使用 pdfplumber 和 pandas 读取 PDF 多个表格并分析指定行列值

使用 pdfplumber 打开 pdf 文档并获取表格数据

import pdfplumber

with pdfplumber.open('example.pdf') as pdf:
    pages = pdf.pages
    for page in pages:
        tables = page.extract_tables()
        # 获取第一个表格数据
        table = tables[0]

使用 pandas 将表格数据转换为 DataFrame 对象

import pandas as pd

df = pd.DataFrame(table[1:], columns=table[0])

分析指定行列的值

# 获取第2列数据
col_data = df.iloc[:, 1]

# 获取第3行第2列的值
value = df.iloc[2, 1]

完整代码如下：

import pdfplumber
import pandas as pd

with pdfplumber.open('example.pdf') as pdf:
    pages = pdf.pages
    for page in pages:
        tables = page.extract_tables()
        # 获取第一个表格数据
        table = tables[0]
        
        # 将表格数据转换为 DataFrame 对象
        df = pd.DataFrame(table[1:], columns=table[0])
        
        # 分析指定行列的值
        col_data = df.iloc[:, 1]
        value = df.iloc[2, 1]
        print(col_data)
        print(value)

Python 使用 pdfplumber 和 pandas 读取 PDF 多个表格并分析指定行列值