Replace python:如何从多个目录中替换或删除多个文件中的所有繁体中文字符串
我尝试将所有中文字符串替换为“#”,但似乎不起作用Replace python:如何从多个目录中替换或删除多个文件中的所有繁体中文字符串,replace,cjk,Replace,Cjk,我尝试将所有中文字符串替换为“#”,但似乎不起作用 import os,re path = 'F:\\project\\test' files = [] # r=root, d=directories, f = files for r, d, f in os.walk(path): for file in f: files.append(os.path.join(r, file)) for file in files: with open(file, 'rb')
import os,re
path = 'F:\\project\\test'
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
files.append(os.path.join(r, file))
for file in files:
with open(file, 'rb') as infile:
while True:
content = infile.readline()
if re.match(r'(.*[\u4E00-\u9FA5]+)|([\u4E00-\u9FA5]+.*)', content.decode('utf-8')):
print(content.decode('utf-8'))
content.decode('utf-8').replace(content.decode('utf-8'),"#")
print(content.decode('utf-8'))
我发现一些代码可以得到中文或非中文的txt格式(但我不知道如何使用)
我可以像这样替换英文字符
import fileinput,re
filename='F:\\project\\test\\test_script.txt'
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
#pattern = re.compile(r'[^\u4e00-\u9fa5]')
#chinese = re.sub(pattern, '', str)
print(line.replace('aaaa', '#'), end='')
#print(chinese)
import fileinput,re
filename='F:\\project\\test\\test_script.txt'
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
pattern = re.compile(r'[^\u4e00-\u9fa5]')
chinese = re.sub(pattern, '', str)
# print(line.replace('aaaa', '#'), end='')
print(line.replace(chinese, '#'), end='')
但是如果txt文件包含像
import fileinput,re
filename='F:\\project\\test\\test_script.txt'
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
#pattern = re.compile(r'[^\u4e00-\u9fa5]')
#chinese = re.sub(pattern, '', str)
print(line.replace('aaaa', '#'), end='')
#print(chinese)
import fileinput,re
filename='F:\\project\\test\\test_script.txt'
with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
for line in file:
pattern = re.compile(r'[^\u4e00-\u9fa5]')
chinese = re.sub(pattern, '', str)
# print(line.replace('aaaa', '#'), end='')
print(line.replace(chinese, '#'), end='')
控制台将显示UnicodeDecodeError:“cp950”编解码器无法解码位置2:非法多字节序列中的字节0xa0
和txt文件将为空
b
标志,Python将自行进行编码和解码content.replace(content,“#”)意思是用一个#
替换整行,而不仅仅是CJK数据
re.sub
(其中“sub”表示“替换”)