Python 不使用正则表达式就可以在文本文件的开头进行匹配吗？_Python_Dictionary_Text Files

Python 不使用正则表达式就可以在文本文件的开头进行匹配吗？

python dictionary

Python 不使用正则表达式就可以在文本文件的开头进行匹配吗？,python,dictionary,text-files,Python,Dictionary,Text Files,你好：有点像python/编程新手。我试图找到每一次某个单词开始一个新句子，并将其替换，在本例中，它是好的旧“Bob”，替换为“John”。我正在使用字典和.replace（）方法进行替换-将字典键替换为关联值。这是我的密码： start_replacements = {'. Bob': '. John', '! Bob': '! John', '? Bob': '? John',

你好：有点像python/编程新手。我试图找到每一次某个单词开始一个新句子，并将其替换，在本例中，它是好的旧“Bob”，替换为“John”。我正在使用字典和

.replace（）

方法进行替换-将字典键替换为关联值。这是我的密码：

start_replacements = {'. Bob': '. John',
                      '! Bob': '! John', 
                      '? Bob': '? John',
                      '\nBob': '\nJohn',
                      }

def search_and_replace(start_word, replacement):
    with open('start_words.txt', 'r+') as article:
        read_article = article.read()
        replaced = read_article.replace(start_word, replacement)
        article.seek(0)
        article.write(replaced)

def main():
    for start_word, replacement in start_replacements.iteritems():
        search_and_replace(start_word, replacement)


if __name__ == '__main__':
    main()

您将在字典中看到，我有4种在句子开头查找“Bob”的方法，但我不确定如何在不使用regex的

的情况下在at文本文件的开头查找“Bob”。我宁愿避免使用regex来使这个脚本更简单。这可能吗

编辑：运行脚本前“start_words.txt”的内容：

Bob is at the beginning of the file. Bob after period! Bob after exclamation? Bob after question.
Bob after newline.

Bob is at the beginning of the file. John after period! John after exclamation? John after question.
John after newline.

运行脚本后的内容：

Bob is at the beginning of the file. Bob after period! Bob after exclamation? Bob after question.
Bob after newline.

Bob is at the beginning of the file. John after period! John after exclamation? John after question.
John after newline.

编辑：不需要正则表达式的解释：我更愿意坚持使用字典，因为它每周都会随着新单词和短语的添加而增长。在这种情况下，它只是“鲍勃”。这本词典可能会增加到几百本。我并不是下定决心不使用正则表达式，但作为一个相对的新手，我试图找出是否还有另一种我现在不知道的方法

编辑：下面@tripleee的第三条评论是一个很好的建议，对我想做的事情很有用。非常感谢

抱歉，我不想让自己和答案中的人投反对票。感谢您的帮助。

问题到您的问题：您为什么不想使用regex

>>> import re
>>> x = "! Bob is a foo bar"
>>> re.sub('^[!?.\\n\\s]*Bob','John', x)
'John is a foo bar'
>>> x[:2]+re.sub('^[!?.\\n\\s]*Bob','John', x)
'! John is a foo bar'

以下是我在不使用正则表达式的情况下的尝试：

>>> x = "! Bob is a foo bar"
>>> first = ['!','?','.','\n']
>>> x = x.split()
>>> x[1] ="John" if x[1] == "Bob" and x[0] in first else x[1]
>>> x
['!', 'John', 'is', 'a', 'foo', 'bar']
>>> " ".join(x)
'! John is a foo bar'

正如@falsetru所指出的：

>>> x = "\n Bob is a foo bar"
>>> x = x.split()
>>> x[1] ="John" if x[1] == "Bob" and x[0] in first else x[1]
>>> " ".join(x)
'Bob is a foo bar'

解决

str.split（）

删除

\n

的最丑陋的方法可能是：

>>> x = "\n Bob is a foo bar"
>>> y = x.split()
>>> y[1] ="John" if y[1] == "Bob" and y[0] in first else y[1]
>>> y
['Bob', 'is', 'a', 'foo', 'bar']
>>> if x.split()[0] == "\n":
...     y.insert(0,'\n')
... 
>>> " ".join(y)
'Bob is a foo bar'
>>> y
['Bob', 'is', 'a', 'foo', 'bar']
>>> if x[0] == "\n":
...     y.insert(0,'\n')
... 
>>> " ".join(y)
'\n Bob is a foo bar'

我应该停止附加我的答案，否则我将容忍OP使用无意义的解决方案，正则表达式可以轻松解析。

您必须调整正在使用的数据或算法，以考虑这种特殊情况

start_replacements = { 'Bob': 'John' }

# In your search_and_replace function.
if read_article.startswith(start_word):
    read_article = replacement + read_article[len(start_word):]

例如，您可以用一些值修饰数据的开头，并在字典中添加相应的替换项

f_begin_deco = '\0\0\0'  # Sequence that won't be in data.

start_replacements = { f_begin_deco + 'Bob': f_begin_deco + 'John' }

# In your search_and_replace function.   
read_article = f_begin_deco + article.read()
replaced = read_article.replace(start_word, replacement)
replaced = replaced[len(f_begin_deco):]  # Remove beginning of file decoration.

此外，您还可能需要探索什么来创建更优雅的数据装饰代码

另一种方法是更改搜索和替换算法，使其考虑特殊情况

start_replacements = { 'Bob': 'John' }

# In your search_and_replace function.
if read_article.startswith(start_word):
    read_article = replacement + read_article[len(start_word):]

您可以（在字典中）使用正则表达式。这不需要迭代字典条目

import re

nonspaces = re.compile(r'\S+') # To extract the first word

def search_and_replace(filepath, replacement):
    def replace_sentence(match):
        def replace_name(match):
            name = match.group()
            return replacement.get(name, name)
        return nonspaces.sub(replace_name, match.group(), count=1)
        # count=1: to change only the first word.
    with open(filepath, 'r+') as f:
        replaced = re.sub('[^.!?]+', replace_sentence, f.read())
        f.seek(0)
        f.write(replaced)
        f.truncate() # NOTE: If name shrinks, unwanted string remains.


start_replacement = {
    'Bob': 'John',
    'Sam': 'Jack',
    'Tom': 'Kevin',
}
search_and_replace('start_words.txt', start_replacement)

有关所用正则表达式的说明

[^.！？]

：匹配任何不是

，

的字符或？
。用于提取句子
>>> re.findall('[^.!?]+', 'Bob is at the beginning. Bob after period!')
['Bob is at the beginning', ' Bob after period']


\S
：匹配任何非空格字符。用于提取第一个单词（可能是名称）：


请参阅和。
您的文本文件是什么样子的？一个示例？一种方法是语句.split（“”[0]
，但我认为正则表达式将更有效。作为一种解决方法，您可以在第一行预先添加一个独特的模式，如###
，并在替换列表中添加一个匹配的###Bob
。当然，在打印之前，将###
替换为空。使用正则表达式，只需将（[\w\s]*）Bob
替换为$1John
即可完成工作。我想这会让你的代码更简单。只需注意，添加正则表达式和定期更改字典应该不会有什么不同。没有
regex？第一行是导入re
@Bibhas:但是为什么？正则表达式绝对是实现这一点的最简单方法，它是完成这项工作的完美工具…@TimPietzcker我知道。但这不是OP的要求。我的评论指出了这样一个事实：alvas说，当代码显示他在使用regex时，他试图不使用regex
。放松，我的输入速度很慢=）start\u replacements
只包含一个空格，但文本可以包含多个空格<代码>开始替换

contians

\n

。文本文件的内容也包含换行符。