BeautifulSoup 获取 <a href="#contest/home/\d+">..+</a> 链接:爬取比赛列表
您可以使用 BeautifulSoup 库来获取 HTML 中的 标签,然后使用正则表达式提取出符合要求的链接。以下是使用 BeautifulSoup 和正则表达式来获取 标签中的链接的代码示例:
import re
from bs4 import BeautifulSoup
html = '''
<html>
<head>
<link href='css/tablewithpin.css' rel='stylesheet'>
</head>
<body>
<div class='contest_list'>
<table id='contest_table' class='table table-condensed table-bordered'>
<thead><tr>
<th align='center'>Contest ID</th>
<th>Title</th>
<th>Mode</th>
<th>Start Time</th>
<th>Submit Time</th>
<th>End Time</th>
<th>Status</th>
<th>Register</th>
</tr></thead>
<tbody>
<tr class='pinned'><td>3946</td>
<td class='title'><a href='#contest/home/3946'>���ɳ־û��߶���&�����ʷ�ר�⡿ 2023.07.14</a></td>
<td><span class='label label-info'>OI Traditional</span></td>
<td>2023-07-12 20:46:00</td>
<td>2023-07-11 23:46:00</td>
<td>2023-07-11 01:46:00</td>
<td><span class='label label-success'>Ended</span></td>
<td></td></tr>
<tr class='pinned'><td>3941</td>
<td class='title'><a href='#contest/home/3941'>��CDQר�⡿2023.07.10</a></td>
<td><span class='label label-info'>OI Traditional</span></td>
<td>2023-07-10 18:43:00</td>
<td>2023-07-10 17:43:00</td>
<td>2023-07-10 17:43:00</td>
<td><span class='label label-success'>Ended</span></td>
<td></td></tr>
<!-- more rows -->
</tbody>
</table>
<div class='pagination pagination-small pagination-centered'><ul><li><a><strong>1</strong></a></li><li><a href='#contest/index/2' data-ci-pagination-page='2'>2</a></li><li><a href='#contest/index/3' data-ci-pagination-page='3'>3</a></li></ul></div></div>
</body>
</html>
'''
soup = BeautifulSoup(html, 'html.parser')
links = soup.find_all('a', href=re.compile(r'(#contest/home/\d+)'))
for link in links:
print(link['href'])
运行以上代码,将输出符合要求的链接:
#contest/home/3946
#contest/home/3941
请注意,以上代码仅为示例,您可能需要根据实际情况进行适当的修改。
..+ 链接:爬取比赛列表" title="BeautifulSoup 获取 ..+ 链接:爬取比赛列表" loading="lazy">
原文地址: https://www.cveoy.top/t/topic/qhLQ 著作权归作者所有。请勿转载和采集!