Python 3 字符串解码错误: 'str' object has no attribute 'decode'
在 Python 3 中,字符串已经默认为 Unicode 编码,不再需要使用 decode() 方法进行解码,因此报错说 'str' object has no attribute 'decode'。
可以将 fix_s 的赋值改为 s,即不进行解码,直接使用 s 写入文件即可。
修改后的代码如下:
import os
import codecs
POS = os.path.join(os.getcwd(), 'pos')
NEG = os.path.join(os.getcwd(), 'neg')
FIX_POS = os.path.join(os.getcwd(), 'fix_pos')
FIX_NEG = os.path.join(os.getcwd(), 'fix_neg')
def fix_corpus(dir_s, dir_t):
for item in os.listdir(dir_s):
with open(os.path.join(dir_s, item), 'r') as f:
try:
s = f.read()
fix_s = s
except UnicodeDecodeError:
try:
fix_s = s.decode('gbk')
except UnicodeDecodeError:
fix_s = s.decode('gb2312', errors='ignore')
with codecs.open(os.path.join(dir_t, item), 'w', encoding='utf8') as ff:
ff.write(fix_s)
if __name__ == "__main__":
if not os.path.isdir(FIX_POS):
os.mkdir(FIX_POS)
if not os.path.isdir(FIX_NEG):
os.mkdir(FIX_NEG)
fix_corpus(POS, FIX_POS)
fix_corpus(NEG, FIX_NEG)
原文地址: https://www.cveoy.top/t/topic/m5W7 著作权归作者所有。请勿转载和采集!