Python 一个单词在一个文件中出现多少次？_Python

Python 一个单词在一个文件中出现多少次？

python

Python 一个单词在一个文件中出现多少次？,python,Python,在我的Python作业中，我的任务是：“编写一个完整的Python程序，读取一个文件trash.txt，并输出Bob一词在该文件中出现的次数。” 我的代码是： count=0 f=open('trash.txt','r') bob_in_trash=f.readlines() for line in bob_in_trash: if "Bob" in line: count=count+1 print(count) f.close() 有没有办法使这段代码更有效？它正确

在我的Python作业中，我的任务是：“编写一个完整的Python程序，读取一个文件trash.txt，并输出Bob一词在该文件中出现的次数。”

我的代码是：

count=0
f=open('trash.txt','r')
bob_in_trash=f.readlines()
for line in bob_in_trash:
    if "Bob" in line:
        count=count+1
print(count)
f.close()

有没有办法使这段代码更有效？它正确地计算了5，但我想知道是否有什么我可以修改的。

您可以读取整个文件并计算“Bob”的名称：

虽然这对于较小的文件更为准确，但在处理较大的文件时，将整个文件加载到内存可能是一个问题

逐行读取仍然更有效，但是使用

str.count

而不是

Bob in line

（这会让您读取其中包含“Bob”的行数）

这样的话，你总是在每行数一个“鲍勃”。。。使用

count

方法如何，这样您就可以对每行出现的次数求和：

for line in bob_in_trash:
    count=count+line.count("Bob")

因为您只查找整个单词，所以最好使用正则表达式：

i = 0
with open('trash.txt','r') as file:
    for result in re.finditer(r'\bBob\b', file.read()):
        i += 1
print('Number of Bobs in file: ' + str(i))

请注意，正则表达式是

\bBob\b

，其中begging和end处的

\b

表示

Bob

必须是单词，而不是单词的一部分。另外，我使用了

finditer

而不是

find

，因为前者对大文件使用的内存要少得多

要节省更多内存，请结合逐行读取：

i = 0
with open('trash.txt','r') as file:
    for line in file:
        for result in re.finditer(r'\bBob\b', line):
            i += 1
print('Number of Bobs in file: ' + str(i))

要获得更多的通用性，请使用正则表达式来区分

bob

、

bob

、

bobcat

等

import re
with open('trash.txt','r') as f:
   count = sum(len(re.findall( r'\bbob\b', line)) for line in f)

选项：

如果您的文件在一行上有两个

bob

s，该怎么办？正如doorknold所暗示的，您正在计算包含“bob”的行数，而不是它在文件中出现的次数。是否也要计算“Bobby”和“kebob”之类的词？根据搜索的词和参数，可能需要执行类似

data=open的操作（“trash.txt”）.read（）。
i = 0
with open('trash.txt','r') as file:
    for result in re.finditer(r'\bBob\b', file.read()):
        i += 1
print('Number of Bobs in file: ' + str(i))

i = 0
with open('trash.txt','r') as file:
    for line in file:
        for result in re.finditer(r'\bBob\b', line):
            i += 1
print('Number of Bobs in file: ' + str(i))

import re
with open('trash.txt','r') as f:
   count = sum(len(re.findall( r'\bbob\b', line)) for line in f)

r'\bbob\b'      # matches bob
r'(?i)\bbob\b'  # matches bob, Bob
r'bob'          # matches bob, Bob, bobcat