Python 协程和多进程并发访问 URL - 高效网络数据抓取
以下是使用 Python 编写的程序,实现从文本文件中读取 n 行的 url,然后使用协程和多进程并发访问这些 url,并打印出返回值。
import asyncio
import aiohttp
import concurrent.futures
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def process_url(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
# 从文件中读取 url
with open('urls.txt', 'r') as file:
urls = [url.strip() for url in file.readlines()]
# 创建一个进程池
with concurrent.futures.ProcessPoolExecutor() as executor:
loop = asyncio.get_event_loop()
# 使用协程并发访问 url
tasks = [loop.run_in_executor(executor, process_url, url) for url in urls]
responses = await asyncio.gather(*tasks)
# 打印返回值
for url, response in zip(urls, responses):
print(f'Response from {url}: {response}')
# 运行主程序
asyncio.run(main())
请确保在程序所在的目录下创建一个名为urls.txt的文本文件,并在其中按行存储要访问的 url。程序将从该文件中读取 url,并使用协程和多进程并发访问这些 url,并打印出每个 url 的返回值。
原文地址: https://www.cveoy.top/t/topic/p3PZ 著作权归作者所有。请勿转载和采集!