python re 清理a超链接 保留a文字

可以使用正则表达式和re模块来实现清理a超链接，保留a文字。具体操作如下：

导入re模块

import re

定义正则表达式

pattern = re.compile(r'<a[^>]*>(.*?)</a>')

这个正则表达式的含义是匹配所有以<a>开头，以</a>结尾的字符串，并以()括起来保留其中的文字内容。

使用sub方法替换字符串

text = 'This is a <a href="http://www.google.com">Google</a> link.'
clean_text = pattern.sub(r'\1', text)

这个操作的含义是将text中所有符合正则表达式的字符串替换为其中匹配到的文字内容，最终生成clean_text。

完整的代码如下：

import re

pattern = re.compile(r'<a[^>]*>(.*?)</a>')
text = 'This is a <a href="http://www.google.com">Google</a> link.'
clean_text = pattern.sub(r'\1', text)

print(clean_text)

输出结果为：

This is a Google link.

可以看到，超链接已经被清除，只保留了其中的文字内容

python re 清理a超链接 保留a文字

python re 清理a超链接保留a文字