Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/security/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中,如何打印文本文件中单词出现的行号?_Python_Python 3.x - Fatal编程技术网

在Python中,如何打印文本文件中单词出现的行号?

在Python中,如何打印文本文件中单词出现的行号?,python,python-3.x,Python,Python 3.x,我需要它来打印文本文件中相应的行号 def index (filename, lst): infile = open('raven.txt', 'r') lines = infile.readlines() words = [] dic = {} for line in lines: line_words = line.split(' ') words.append(line_words) for i in ra

我需要它来打印文本文件中相应的行号

def index (filename, lst):
    infile = open('raven.txt', 'r')
    lines =  infile.readlines()
    words = []
    dic = {}

    for line in lines:
        line_words = line.split(' ')
        words.append(line_words)
    for i in range(len(words)):
        for j in range(len(words[i])):
            if words[i][j] in lst:

                dic[words[i][j]] = i

    return dic
结果是:

In: index('raven.txt',['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])

Out: {'dying': 8, 'mortal': 29, 'raven': 77, 'ghost': 8}
(上面的文字出现在几行中,但它只打印一行,而对于某些文字,它不打印任何内容。) 另外,它不计算文本文件中的空行。因此8实际上应该是9,因为有一个空行它不计算。)


请告诉我如何解决此问题。

使用defaultdict为每行创建行号列表:

from collections import defaultdict
def index(filename, lst):
    with open(filename, 'r') as infile:
        lines = [line.split() for line in infile]
    word2linenumbers = defaultdict(list)

    for linenumber, line in enumerate(lines, 1):
        for word in line:
            if word in lst:
                word2linenumbers[word].append(linenumber)
    return word2linenumbers
您还可以使用为每个单词启动新列表,或者在已找到该单词的情况下附加到现有列表:

def index(filename, lst):
    # For larger lists, checking membership will be asymptotically faster using a set.
    lst = set(lst) 
    dic = {}

    with open(filename, 'r') as fobj:
        for lineno, line in enumerate(fobj, 1):
            words = line.split()
            for word in words:
                if word in lst:
                    dic.setdefault(word, []).append(lineno)

    return dic

如果单词在同一行中出现多次,则使用集合而不是列表非常有用。

您有两个主要问题可以通过以下方法解决:

1.)多个索引:您需要启动/分配一个列表作为dict值,而不仅仅是一个int。否则,每次找到包含该单词的新行时,将为每个单词重新分配一个新索引

2.)空行应该被理解为一行,所以我认为这只是一个索引问题。您的第一行被索引为
0
,因为范围中的第一个数字从0开始

您可以按如下方式简化程序:

def index (filename, lst):
    wordinds = {key:[] for key in lst} #initiates an empty list for each word
    with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
    #the with statement is useful. trust.
        for linenum,line in enumerate(infile):
            for word in line.rstrip().split(): #strip new line and split into words
                if word in wordinds:
                    wordinds[word].append(linenum)

    return {x for x in wordinds.iteritems() if x[1]} #filters empty lists
这简化了一切,可以嵌套到一个为每行枚举的
for
循环中。如果希望第一行为
1
,第二行为
2
,则必须将
wordinds[word].append(linenum)
更改为
…append(linenum+1)


编辑:有人在另一个答案中提出了一个很好的观点,让
枚举(infle,1)
从索引1开始枚举。这就更干净了。

9可能是8,因为
range()
函数返回一个从0开始的列表,所以行号计数是0索引的。词典中每个行号只有一个行号的原因是,您只将每个单词映射到一个行号。因此,将每个条目的值替换为最后一次看到该单词的行。如果您想要所有发生的事件,您需要映射到行号列表并不断追加到该列表。您可以使用
enumerate(fobj,1)
代替
lineno=i+1
def index (filename, lst):
    wordinds = {key:[] for key in lst} #initiates an empty list for each word
    with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
    #the with statement is useful. trust.
        for linenum,line in enumerate(infile):
            for word in line.rstrip().split(): #strip new line and split into words
                if word in wordinds:
                    wordinds[word].append(linenum)

    return {x for x in wordinds.iteritems() if x[1]} #filters empty lists