Python 爬取云代理可用 IP 并写入文件

以下是 Python 实现爬取云代理可用 IP 并写入文件的代码：

import requests

url = 'http://www.ip3366.net/free/?stype=1&page=1'

headers = {
    'Host': 'www.ip3366.net',
    'Referer': 'http://www.ip3366.net/free/?stype=1&page=1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299'
}

response = requests.get(url, headers=headers)

if response.status_code == 200:
    html = response.text
    with open('ips.txt', 'w') as f:
        for line in html.split('
'):
            if '<td>' in line:
                ip = line.split('<td>')[1].split('</td>')[0]
                port = line.split('<td>')[2].split('</td>')[0]
                protocol = line.split('<td>')[5].split('</td>')[0]
                if protocol == 'HTTP' or protocol == 'HTTPS':
                    f.write(protocol.lower() + '://' + ip + ':' + port + '
')
    print('IPs saved to ips.txt')
else:
    print('Failed to get IPs')

首先，我们定义了要爬取的 URL 和请求头。然后，使用 requests 库向 URL 发起 GET 请求，并判断响应状态码是否为 200。如果是，则将响应内容解析并写入文件中，文件名为 ips.txt。解析 HTML 代码的方法是遍历每一行，查找包含 <td> 标签的行，然后提取出 IP 地址、端口和协议类型。最后，将符合要求的 IP 地址和端口拼接成完整的 URL 并写入文件中。

运行以上代码即可爬取云代理可用 IP 并写入文件。