Python2.7-使用字典从文本文件查找并替换为新文本文件_Python_Python 2.7

Python2.7-使用字典从文本文件查找并替换为新文本文件

python python-2.7

Python2.7-使用字典从文本文件查找并替换为新文本文件,python,python-2.7,Python,Python 2.7,我是编程新手，在过去几个月的业余时间里一直在学习python。我决定尝试创建一个小脚本，在文本文件中将美国拼写转换为英语拼写在过去的5个小时里，我一直在尝试各种各样的事情，但最终找到了一些让我更接近我的目标的东西，但还没有达到目标 #imported dictionary contains 1800 english:american spelling key:value pairs. from english_american_dictionary import dict def rep

我是编程新手，在过去几个月的业余时间里一直在学习python。我决定尝试创建一个小脚本，在文本文件中将美国拼写转换为英语拼写

在过去的5个小时里，我一直在尝试各种各样的事情，但最终找到了一些让我更接近我的目标的东西，但还没有达到目标

#imported dictionary contains 1800 english:american spelling key:value pairs. 
from english_american_dictionary import dict


def replace_all(text, dict):
    for english, american in dict.iteritems():
        text = text.replace(american, english)
    return text


my_text = open('test_file.txt', 'r')

for line in my_text:
    new_line = replace_all(line, dict)
    output = open('output_test_file.txt', 'a')
    print >> output, new_line

output.close()

我确信有一种更好的方式来处理事情，但对于这个脚本，我有以下问题：

在输出文件中，每隔一行写入一行，中间有一个换行符，但原始的test_file.txt中没有。本页底部显示的test_file.txt的内容
只有一行中的第一个美式拼写转换为英语
我真的不想在追加模式下打开输出文件，但无法理解此代码结构中的“r”

感谢您对这位热心的新手的任何帮助

test_file.txt的内容包括：

I am sample file.
I contain an english spelling: colour.
3 american spellings on 1 line: color, analyze, utilize.
1 american spelling on 1 line: familiarize.

print

语句添加了自己的换行符，但您的行已经有了自己的换行符。您可以从

新行

中删除新行，也可以使用较低的级别

output.write(new_line)

相反（它准确地写下你传递给它的内容）

关于你的第二个问题，我认为我们需要一个实际的例子

replace（）

确实应该替换所有发生的事件

>>> "abc abc abcd ab".replace("abc", "def")
'def def defd ab'

我不知道你的第三个问题是什么。如果要替换输出文件，请执行以下操作

output = open('output_test_file.txt', 'w')

'w'

表示您正在打开文件进行写入。

您看到的额外空行是因为您正在使用

打印

写出一行，该行的末尾已经包含换行符。由于

print

也会写入自己的换行符，因此输出将变为双倍行距。一个简单的解决方法是使用

outfile.write（新行）

至于文件模式，问题是您要反复打开输出文件。你应该一开始就打开它一次。使用

和语句来处理打开的文件通常是个好主意，因为它们会在您处理完文件后帮您关闭文件
我不理解你的另一个问题，只有一些替换发生。您的词典是否缺少'analyze'
和'use'
的拼写
我的一个建议是不要逐行更换。您可以使用file.read（）
一次读取整个文件，然后将其作为一个单元进行处理。这可能会更快，因为它不需要经常在拼写词典中的项目上循环（仅一次，而不是每行一次）：
编辑：
为了使您的代码正确处理包含其他单词的单词（如“完整”包含“轮胎”），您可能需要放弃简单的str.replace
方法，转而使用正则表达式
下面是一个快速组合的解决方案，它使用re.sub
，给定一个从美式英语到英式英语拼写变化的词典（即，与当前词典的顺序相反）：
这种代码结构的一个优点是，如果你以另一种顺序将字典传递给replacer\u factory
函数，你可以很容易地将英式英语拼写转换回美式英语拼写。
正如上面所有的好答案一样，我写了一个新版本，我认为它更像python，希望这有帮助：
# imported dictionary contains 1800 english:american spelling key:value pairs.
mydict = {
    'color': 'colour',
}


def replace_all(text, mydict):
    for english, american in mydict.iteritems():
        text = text.replace(american, english)
    return text

try:
    with open('new_output.txt', 'w') as new_file:
        with open('test_file.txt', 'r') as f:
            for line in f:
                new_line = replace_all(line, mydict)
                new_file.write(new_line)
except:
    print "Can't open file!"

您也可以看到我之前问过的答案，其中包含许多最佳实践建议：

下面是关于如何编写python的其他一些技巧（更多python:）

祝你好运：）
请注意：使用dict
作为变量名是一个非常糟糕的主意，因为这会影响内置字典类型的名称。如果您不小心，这可能会导致您的代码以混乱的方式中断。感谢您提供的提示，我们不会在此处使用“dict”作为变量。@Tim Peters-谢谢。我很困惑，“.replace”方法实际上会替换所有实例。我的大脑被炸了！对于'w'或'a'的东西，我必须使用append，因为它在一个循环中，不能使用'w'。我只是想知道是否有一种方法可以使用“w”，这样我就不必担心文件中是否已经有数据了。我可以重写它。谢谢@shengy这也是一个很好的答案，还有有用的链接，非常感谢：-）我感谢所有的答案，但是我必须使用@blcknght的“file.read（）”来完成这一步，然后允许我以“w”模式打开out_文件真是太棒了！分析和利用事实上转换成了他们的英文版本，我简直是疯了。我用一个需要转换的更大的文件尝试了这个脚本，其中一个问题是“整个”被转换为“entyre”。这是因为“轮胎：轮胎”在字典中，并且发生了部分匹配。有没有办法阻止这一切，并逐字逐句地保持下去？@Darren:好的，我的编辑完成了。我已经注释掉了依赖于外部文件或模块的位，但是您可以插入（反向的）字典和数据文件，而不是我使用的示例，它应该可以工作。replacer函数作为参数传递给re.sub
re.sub
将为正则表达式模式匹配的每个单词调用它一次。match
参数是由re
代码创建的，用于描述匹配的内容。它是由re.search
@Darren:Yes>返回的同一类对象，re.sub
贯穿整个文本，用您想要的任何内容替换每个匹配项。使事情更加复杂的是，第二个参数可以以几种方式工作：它可以是一个函数，它接受一个MatchObject
（将为每个匹配调用）或一个字符串（可以包含反向引用）。这发生在re.sub内部，这就是为什么您看不到它的原因。
import re

#from english_american_dictionary import ame_to_bre_spellings
ame_to_bre_spellings = {'tire':'tyre', 'color':'colour', 'utilize':'utilise'}

def replacer_factory(spelling_dict):
    def replacer(match):
        word = match.group()
        return spelling_dict.get(word, word)
    return replacer

def ame_to_bre(text):
    pattern = r'\b\w+\b'  # this pattern matches whole words only
    replacer = replacer_factory(ame_to_bre_spellings)
    return re.sub(pattern, replacer, text)

def main():
    #with open('test_file.txt') as in_file:
    #    text = in_file.read()
    text = 'foo color, entire, utilize'

    #with open('output_test_file.txt', 'w') as out_file:
    #    out_file.write(ame_to_bre(text))
    print(ame_to_bre(text))

if __name__ == '__main__':
    main()

# imported dictionary contains 1800 english:american spelling key:value pairs.
mydict = {
    'color': 'colour',
}


def replace_all(text, mydict):
    for english, american in mydict.iteritems():
        text = text.replace(american, english)
    return text

try:
    with open('new_output.txt', 'w') as new_file:
        with open('test_file.txt', 'r') as f:
            for line in f:
                new_line = replace_all(line, mydict)
                new_file.write(new_line)
except:
    print "Can't open file!"