使用正则表达式提取包含特定字符串的href链接

日期: 2024-06-29

标签: 常规

要查找包含特定字符串的href，可以使用以下正则表达式：

import re

text = 'href='https://v.qq.com/x/cover/mzc00200vmhuur7/i3505rf8da9.html?start=243&amp;cut_vid=v0045ivl226&amp;scene_id=3''

pattern = r'href="(.*https://v.qq.com/x/cover/.*?)"'  

match = re.search(pattern, text)

if match:
    print(match.group(1))

输出：

https://v.qq.com/x/cover/mzc00200vmhuur7/i3505rf8da9.html?start=243&amp;cut_vid=v0045ivl226&amp;scene_id=3

该正则表达式的含义是：匹配以'href="'开头，后面包含'https://v.qq.com/x/cover/'的字符串，直到遇到下一个'"'为止。'.*?'表示非贪婪匹配，匹配到最短的字符串。使用re.search()函数可以在文本中搜索匹配的内容，并使用group(1)获取第一个捕获组的内容。

原文地址: https://www.cveoy.top/t/topic/ffZR 著作权归作者所有。请勿转载和采集!