Python 删除文件中每行中的unicode数字_Python_Unicode

Python 删除文件中每行中的unicode数字

python unicode

Python 删除文件中每行中的unicode数字,python,unicode,Python,Unicode,我的文件（outputfile5.txt）包含：（该文件包含unicode格式的所有元素）我需要的输出应保存在另一个文件（result.txt）中，如：我的代码是： fq = codecs.open('outputfile5.txt', encoding='utf-8') lines = fq.readlines() fq.close() fa = codecs.open('result.txt', 'w') for line in lines: line1=[] line1

我的文件（outputfile5.txt）包含：（该文件包含unicode格式的所有元素）

我需要的输出应保存在另一个文件（result.txt）中，如：

我的代码是：

fq = codecs.open('outputfile5.txt', encoding='utf-8')
lines = fq.readlines()
fq.close()
fa = codecs.open('result.txt', 'w')
for line in lines:
    line1=[]
    line1=line.split()
    for i in line1:
        if u'-->' not in i or u',' not in i:
            s = re.match('([0-9]+)', i).group(1)
            word=i[len(s):]
            fa.write(word.encode('UTF-8'))
        else:
            fa.write(i.encode('UTF-8'))
fa.close()

运行代码时，会显示以下错误：

s = re.match('([0-9]+)', i).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

我怎么解决这个问题呢？

我不确定我是否遗漏了一些明显的东西，但这是否符合你的要求

with open('outputfile5.txt') as input, open('result.txt', 'w') as output:
    for line in input:
        output.write(''.join([c for c in line if not c.isdigit()]))

result.txt：

അവന് --> രാമന്   
അവള്ക്ക് --> സീതയെ   
അവള് --> അവള്ക്ക് --> സീതയെ   
അത് --> പൂവ്   
അവര് --> സീതയെ , രാമന്   
അവിടെ --> കോട്ടയത്ത്   
അവര്ക്ക് --> സീതയെ , രാമന്   
അവിടെ --> അവിടെ --> കോട്ടയത്ത്   
അവന് --> രാമന്   
അവനെ --> ലക്ഷ്മണന്   
അവള്ക്ക് --> സീതയെ   
ഈ --> വഴ   
അവര് --> സീതയെ , ലക്ഷ്മണന്   
അവിടെ --> കോട്ടയം

你可以简单地做到这一点

import re  
with open('outputfile5.txt') as inpf, open('result.txt', 'w') as outf:
for line in inpf:
   outf.write(re.sub('\d+', '', line))

直接走怎么样

with codecs.open('outputfile5.txt', encoding='utf-8') as input:
  with codecs.open('result.txt', 'w', encoding='utf-8') as output:
    for line in input:
      output.write(re.sub(r'[0-9]*', '', line))

解决方案？

因为

re.match（“（[0-9]+）”，i）

不匹配，我也改变了，但它不起作用。你不需要

join

，

''中的内部列表。join（如果不是c.isdigit（），c代表c行）

@Burhan我知道，（事实上我对其他人说过同样的话：）但是在

join

，

的情况下，问题是，这适用于谁？：）
import re  
with open('outputfile5.txt') as inpf, open('result.txt', 'w') as outf:
for line in inpf:
   outf.write(re.sub('\d+', '', line))

with codecs.open('outputfile5.txt', encoding='utf-8') as input:
  with codecs.open('result.txt', 'w', encoding='utf-8') as output:
    for line in input:
      output.write(re.sub(r'[0-9]*', '', line))