beautifulsoup获取a href=#contesthomed++alink href=csstablewithpincss rel=stylesheetdiv class=contest_listtable id=contest_table class=table table-condensed table-borderedtheadtrth align=centerContest ID
您可以使用BeautifulSoup库来获取HTML中的标签,然后使用正则表达式提取出符合要求的链接。以下是使用BeautifulSoup和正则表达式来获取标签中的链接的代码示例:
import re
from bs4 import BeautifulSoup
html = '''
<html>
<head>
<link href="css/tablewithpin.css" rel="stylesheet">
</head>
<body>
<div class="contest_list">
<table id="contest_table" class="table table-condensed table-bordered">
<thead><tr>
<th align="center">Contest ID</th>
<th>Title</th>
<th>Mode</th>
<th>Start Time</th>
<th>Submit Time</th>
<th>End Time</th>
<th>Status</th>
<th>Register</th>
</tr></thead>
<tbody>
<tr class="pinned"><td>3946</td>
<td class="title"><a href="#contest/home/3946">���ɳ־û��߶���&�����ʷ�ר�⡿ 2023.07.14</a></td>
<td><span class="label label-info">OI Traditional</span></td>
<td>2023-07-12 20:46:00</td>
<td>2023-07-11 23:46:00</td>
<td>2023-07-11 01:46:00</td>
<td><span class="label label-success">Ended</span></td>
<td></td></tr>
<tr class="pinned"><td>3941</td>
<td class="title"><a href="#contest/home/3941">��CDQר�⡿2023.07.10</a></td>
<td><span class="label label-info">OI Traditional</span></td>
<td>2023-07-10 18:43:00</td>
<td>2023-07-10 17:43:00</td>
<td>2023-07-10 17:43:00</td>
<td><span class="label label-success">Ended</span></td>
<td></td></tr>
<!-- more rows -->
</tbody>
</table>
<div class="pagination pagination-small pagination-centered"><ul><li><a><strong>1</strong></a></li><li><a href="#contest/index/2" data-ci-pagination-page="2">2</a></li><li><a href="#contest/index/3" data-ci-pagination-page="3">3</a></li></ul></div></div>
</body>
</html>
'''
soup = BeautifulSoup(html, 'html.parser')
links = soup.find_all('a', href=re.compile(r'(#contest/home/\d+)'))
for link in links:
print(link['href'])
运行以上代码,将输出符合要求的链接:
#contest/home/3946
#contest/home/3941
请注意,以上代码仅为示例,您可能需要根据实际情况进行适当的修改
原文地址: https://www.cveoy.top/t/topic/iAQH 著作权归作者所有。请勿转载和采集!