在Python中，如何打印文本文件中单词出现的行号？_Python_Python 3.x

在Python中，如何打印文本文件中单词出现的行号？

python python-3.x

在Python中，如何打印文本文件中单词出现的行号？,python,python-3.x,Python,Python 3.x,我需要它来打印文本文件中相应的行号 def index (filename, lst): infile = open('raven.txt', 'r') lines = infile.readlines() words = [] dic = {} for line in lines: line_words = line.split(' ') words.append(line_words) for i in ra

我需要它来打印文本文件中相应的行号

def index (filename, lst):
    infile = open('raven.txt', 'r')
    lines =  infile.readlines()
    words = []
    dic = {}

    for line in lines:
        line_words = line.split(' ')
        words.append(line_words)
    for i in range(len(words)):
        for j in range(len(words[i])):
            if words[i][j] in lst:

                dic[words[i][j]] = i

    return dic

结果是：

In: index('raven.txt',['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])

Out: {'dying': 8, 'mortal': 29, 'raven': 77, 'ghost': 8}

（上面的文字出现在几行中，但它只打印一行，而对于某些文字，它不打印任何内容。）另外，它不计算文本文件中的空行。因此8实际上应该是9，因为有一个空行它不计算。）

请告诉我如何解决此问题。

使用defaultdict为每行创建行号列表：

from collections import defaultdict
def index(filename, lst):
    with open(filename, 'r') as infile:
        lines = [line.split() for line in infile]
    word2linenumbers = defaultdict(list)

    for linenumber, line in enumerate(lines, 1):
        for word in line:
            if word in lst:
                word2linenumbers[word].append(linenumber)
    return word2linenumbers

您还可以使用为每个单词启动新列表，或者在已找到该单词的情况下附加到现有列表：

def index(filename, lst):
    # For larger lists, checking membership will be asymptotically faster using a set.
    lst = set(lst) 
    dic = {}

    with open(filename, 'r') as fobj:
        for lineno, line in enumerate(fobj, 1):
            words = line.split()
            for word in words:
                if word in lst:
                    dic.setdefault(word, []).append(lineno)

    return dic

如果单词在同一行中出现多次，则使用集合而不是列表非常有用。

您有两个主要问题可以通过以下方法解决：

1.）多个索引：您需要启动/分配一个列表作为dict值，而不仅仅是一个int。否则，每次找到包含该单词的新行时，将为每个单词重新分配一个新索引

2.）空行应该被理解为一行，所以我认为这只是一个索引问题。您的第一行被索引为

，因为范围中的第一个数字从0开始

您可以按如下方式简化程序：

def index (filename, lst):
    wordinds = {key:[] for key in lst} #initiates an empty list for each word
    with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
    #the with statement is useful. trust.
        for linenum,line in enumerate(infile):
            for word in line.rstrip().split(): #strip new line and split into words
                if word in wordinds:
                    wordinds[word].append(linenum)

    return {x for x in wordinds.iteritems() if x[1]} #filters empty lists

这简化了一切，可以嵌套到一个为每行枚举的

for

循环中。如果希望第一行为

，第二行为

，则必须将

wordinds[word].append（linenum）

更改为

…append（linenum+1）

编辑：有人在另一个答案中提出了一个很好的观点，让

枚举（infle，1）

从索引1开始枚举。这就更干净了。

9可能是8，因为

range（）

函数返回一个从0开始的列表，所以行号计数是0索引的。词典中每个行号只有一个行号的原因是，您只将每个单词映射到一个行号。因此，将每个条目的值替换为最后一次看到该单词的行。如果您想要所有发生的事件，您需要映射到行号列表并不断追加到该列表。您可以使用

enumerate（fobj，1）

代替

lineno=i+1

def index (filename, lst):
    wordinds = {key:[] for key in lst} #initiates an empty list for each word
    with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
    #the with statement is useful. trust.
        for linenum,line in enumerate(infile):
            for word in line.rstrip().split(): #strip new line and split into words
                if word in wordinds:
                    wordinds[word].append(linenum)

    return {x for x in wordinds.iteritems() if x[1]} #filters empty lists