Python 爬虫：如何判断网站是否存在 XML 文件并提取数据

要判断其他网站是否存在 XML 文件并提取数据，可以使用以下步骤进行爬取：

使用 Python 的 requests 库发送 HTTP 请求，获取目标网站的页面内容。
使用 BeautifulSoup 库解析页面内容，提取出所有的链接。
遍历所有的链接，判断链接的后缀是否为 '.xml'，如果是则表示存在 XML 文件。
如果存在 XML 文件，可以使用相应的库（如 xml.etree.ElementTree）解析 XML 文件，提取所需的数据。

以下是一个示例代码：

import requests
from bs4 import BeautifulSoup

# 目标网站的 URL
url = 'https://example.com'

# 发送 HTTP 请求，获取页面内容
response = requests.get(url)
content = response.text

# 使用 BeautifulSoup 解析页面内容
soup = BeautifulSoup(content, 'html.parser')

# 提取所有的链接
links = soup.find_all('a')

# 遍历链接，判断是否存在 XML 文件
for link in links:
    href = link.get('href')
    if href.endswith('.xml'):
        xml_url = url + href
        # 发送 HTTP 请求，获取 XML 文件内容
        xml_response = requests.get(xml_url)
        xml_content = xml_response.text
        # 解析 XML 文件，提取所需的数据
        # ...

请注意，爬取其他网站的数据需要遵守相关网站的爬虫规则，不得进行非法操作。