html、xml文件中用xpath、string、regex方式提取内容示例

XPath 示例：假设有以下 XML 文件：

<bookstore>
  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J.K. Rowling</author>
    <year>2003</year>
    <price>29.99</price>
  </book>
</bookstore>

使用 XPath 表达式提取内容示例：

from lxml import etree

# 读取 XML 文件
tree = etree.parse("bookstore.xml")

# 使用 XPath 表达式提取内容
titles = tree.xpath("//book/title/text()")
authors = tree.xpath("//book/author/text()")
years = tree.xpath("//book/year/text()")
prices = tree.xpath("//book/price/text()")

# 打印提取到的内容
for i in range(len(titles)):
    print("Title:", titles[i])
    print("Author:", authors[i])
    print("Year:", years[i])
    print("Price:", prices[i])
    print("--------------------")

字符串匹配示例：

import re

# 定义字符串
text = "Hello, my name is John. My email is john@example.com."

# 使用正则表达式提取邮箱
pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b"
emails = re.findall(pattern, text)

# 打印提取到的邮箱
for email in emails:
    print("Email:", email)

正则表达式匹配示例：

import re

# 定义字符串
text = "Hello, my name is John. My email is john@example.com."

# 使用正则表达式匹配内容
pattern = r"name is (\w+)"
matches = re.findall(pattern, text)

# 打印匹配到的内容
for match in matches:
    print("Match:", match)

以上示例分别演示了在 XML 文件中使用 XPath 提取内容，在字符串中使用正则表达式提取邮箱，以及使用正则表达式匹配内容