Python 删除一些文字替换txt文件中的其他文字
我有一个txt文件(myText.txt),其中包含许多行文本 我想知道:Python 删除一些文字替换txt文件中的其他文字,python,string,file-io,replace,Python,String,File Io,Replace,我有一个txt文件(myText.txt),其中包含许多行文本 我想知道: 如何创建需要删除的单词列表(我想自己设置这些单词) 如何创建需要替换的单词列表 例如,如果myText.txt是: The ancient Romans influenced countries and civilizations in the following centuries. Their language, Latin, became the basis for many other Europ
- 如何创建需要删除的单词列表(我想自己设置这些单词)
- 如何创建需要替换的单词列表
The ancient Romans influenced countries and civilizations in the following centuries.
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.
- 我想删除我想替换的“”中的“”和“” “古老的”由“古老的”
- 我想替换“月”和“世纪” “年”
def replace():
contents = ""
deleteWords = ["the ", "and ", "in "]
replaceWords = {"ancient": "old", "month":"years", "centuries":"years"}
with open("meText.txt") as f:
contents = f.read()
for word in deleteWords:
contents = contents.replace(word,"")
for key, value in replaceWords.iteritems():
contents = contents.replace(key, value)
return contents
您可以始终使用正则表达式:
import re
st='''\
The ancient Romans influenced countries and civilizations in the following centuries.
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.'''
deletions=('and','in','the')
repl={"ancient": "old", "month":"years", "centuries":"years"}
tgt='|'.join(r'\b{}\b'.format(e) for e in deletions)
st=re.sub(tgt,'',st)
for word in repl:
tgt=r'\b{}\b'.format(word)
st=re.sub(tgt,repl[word],st)
print st
使用列表进行删除,使用字典进行替换。它应该是这样的:
def processTextFile(filename_in, filename_out, delWords, repWords):
with open(filename_in, "r") as sourcefile:
for line in sourcefile:
for item in delWords:
line = line.replace(item, "")
for key,value in repWords.items():
line = line.replace(key,value)
with open(filename_out, "a") as outfile:
outfile.write(line)
if __name__ == "__main__":
delWords = []
repWords = {}
delWords.extend(["the ", "and ", "in "])
repWords["ancient"] = "old"
repWords["month"] = "years"
repWords["centuries"] = "years"
processTextFile("myText.txt", "myOutText.txt", delWords, repWords)
请注意,这是为Python 3.3.2编写的,这就是我使用items()的原因。如果使用Python 2.x,请使用iteritems(),因为我认为它更有效,尤其是对于大型文本文件。谢谢您的帮助。我刚刚收到一条错误消息“AttributeError:‘dict’对象没有属性‘iteritems’”,我只是更新了Python的最新版本。这正常吗?谢谢。如果您使用的是Python3,那么说replaceWords.items()hello非常好用。只是一个问题有时候我的文本中有“+”和“-”符号。然而,Python似乎不接受删除=('and'、'in'、'the'、'+'、'-')是否有特殊的方式输入这些字符?谢谢你这里有一些字符对正则表达式有意义,比如
+
和-
我的建议是花些时间在正则表达式教程网站上学习这些字符。是一个好的。谢谢你的代码。哇,有很多方法可以实现我的目标:)