正则表达式筛选出xml中的内容

假设我们有以下的XML文档：

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications 
      with XML.</description>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-12-16</publish_date>
    <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
  </book>
  <book id="bk103">
    <author>Corets, Eva</author>
    <title>Maeve Ascendant</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-11-17</publish_date>
    <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
  </book>
</catalog>

我们可以使用正则表达式来筛选出每本书的作者和标题，例如：

<author>(.*?)<\/author>\s*<title>(.*?)<\/title>

这个正则表达式的含义是：

匹配 <author> 标签内的任何字符，使用非贪婪模式 (.*?)，并捕获这些字符到第一个捕获组
匹配 <\/author> 标签
匹配零个或多个空白字符 \s*
匹配 <title> 标签内的任何字符，使用非贪婪模式 (.*?)，并捕获这些字符到第二个捕获组
匹配 <\/title> 标签

我们可以将这个正则表达式应用到完整的XML文档中，例如在Python中：

import re

xml_text = """
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications 
      with XML.</description>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-12-16</publish_date>
    <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
  </book>
  <book id="bk103">
    <author>Corets, Eva</author>
    <title>Maeve Ascendant</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-11-17</publish_date>
    <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
  </book>
</catalog>
"""

pattern = r"<author>(.*?)<\/author>\s*<title>(.*?)<\/title>"
matches = re.findall(pattern, xml_text)

for match in matches:
    print("Author:", match[0])
    print("Title:", match[1])

这将输出：

Author: Gambardella, Matthew
Title: XML Developer's Guide
Author: Ralls, Kim
Title: Midnight Rain
Author: Corets, Eva
Title: Maeve Ascendant
``