爬虫程序怎么写

爬虫程序一般分为以下几个步骤：

发起请求：使用Python的requests库或者Scrapy框架发起网络请求，获取网页的HTML代码。
解析HTML：使用Python的BeautifulSoup库或者Scrapy框架解析HTML代码，提取需要的信息。
存储数据：将提取的数据存储到数据库或者文件中。
循环爬取：通过循环或者递归的方式，不断地发起请求和解析HTML，直到完成爬取任务。

下面是一个简单的Python爬虫程序示例：

import requests
from bs4 import BeautifulSoup

# 发起请求
url = 'https://www.example.com'
response = requests.get(url)
html = response.text

# 解析HTML
soup = BeautifulSoup(html, 'html.parser')
title = soup.title.string

# 存储数据
with open('data.txt', 'w', encoding='utf-8') as f:
    f.write(title)

# 循环爬取
for i in range(2, 11):
    url = 'https://www.example.com/page/{}'.format(i)
    response = requests.get(url)
    html = response.text
    soup = BeautifulSoup(html, 'html.parser')
    # 进一步解析HTML，提取需要的信息
    # 存储数据

注意：在编写爬虫程序时，要遵守网站的爬虫规则，尊重网站的版权和隐私，不要过度爬取或者攻击网站，以免触犯法律和道德底线