python正则表达式re.findall执行时间过长_Python_Regex

python正则表达式re.findall执行时间过长

python regex

python正则表达式re.findall执行时间过长,python,regex,Python,Regex,我想删除文件中以点“.”结尾的所有单词。我的文件大约有15MB，有400000多字。我正在使用re.findall查找并替换这些单词 for w in re.findall(r'([a-zA-Z0-9]+\.)', test_dict): test_dict = test_dict.replace(w, ' ') 这需要很长时间才能执行。有没有提高性能的方法或任何其他替代方法来查找和替换这些单词？您可以尝试使用re.sub而不是循环使用re.findall的结果 # Example t

我想删除文件中以点“.”结尾的所有单词。我的文件大约有15MB，有400000多字。我正在使用

re.findall

查找并替换这些单词

for w in re.findall(r'([a-zA-Z0-9]+\.)', test_dict):
    test_dict = test_dict.replace(w, ' ')

这需要很长时间才能执行。有没有提高性能的方法或任何其他替代方法来查找和替换这些单词？

您可以尝试使用

re.sub

而不是循环使用

re.findall

的结果

# Example text:
text = 'this is. a text with periods.'

re.sub(r'([a-zA-Z0-9]+\.)', ' ', text)

这将返回与循环相同的结果：

'this   a text with  '

在一个相对较小的文档（179KB，Romeo and Juliet）上，

re.findall

循环大约需要0.369秒，

re.sub

大约需要0.0091秒。

您可以尝试使用

re.sub

而不是在

re.findall

的结果上循环

# Example text:
text = 'this is. a text with periods.'

re.sub(r'([a-zA-Z0-9]+\.)', ' ', text)

这将返回与循环相同的结果：

'this   a text with  '

在一个相对较小的文档（179KB，Romeo and Juliet）上，

re.findall

循环大约需要0.369秒，

re.sub

大约需要0.0091秒。

在Python中，可以逐行循环文件和逐字循环

所以你可以考虑：

with open(your_file) as f_in, open(new_file, 'w') as f_out:
    for line in f_in:
         f_out.write(' '.join(w for w in line.split() if not w.endswith('.')+'\n')
# then decide if you want to overwrite your_file with new_file

在Python中，您可以逐行循环文件和逐字循环

所以你可以考虑：

with open(your_file) as f_in, open(new_file, 'w') as f_out:
    for line in f_in:
         f_out.write(' '.join(w for w in line.split() if not w.endswith('.')+'\n')
# then decide if you want to overwrite your_file with new_file

您是只使用pythonic方法还是灵活地使用shell？无论如何，此代码不能正确地实现所述的需求。考虑一下，如果一个虚词是另一个后缀——<代码> .<代码>和<代码>车床。例如：<代码>。如果

首先找到了。

，替换词也会修改较长的单词-留下一个不属于的

la

。您是只使用pythonic方法还是灵活使用shell？无论如何，此代码不能正确实现所述要求。考虑一下，如果一个虚词是另一个后缀——<代码> .<代码>和<代码>车床。例如：<代码>。如果

首先找到了。

，则替换词也会修改较长的单词-留下不属于的

la

。