在Python中,如何打印文本文件中单词出现的行号?
我需要它来打印文本文件中相应的行号在Python中,如何打印文本文件中单词出现的行号?,python,python-3.x,Python,Python 3.x,我需要它来打印文本文件中相应的行号 def index (filename, lst): infile = open('raven.txt', 'r') lines = infile.readlines() words = [] dic = {} for line in lines: line_words = line.split(' ') words.append(line_words) for i in ra
def index (filename, lst):
infile = open('raven.txt', 'r')
lines = infile.readlines()
words = []
dic = {}
for line in lines:
line_words = line.split(' ')
words.append(line_words)
for i in range(len(words)):
for j in range(len(words[i])):
if words[i][j] in lst:
dic[words[i][j]] = i
return dic
结果是:
In: index('raven.txt',['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])
Out: {'dying': 8, 'mortal': 29, 'raven': 77, 'ghost': 8}
(上面的文字出现在几行中,但它只打印一行,而对于某些文字,它不打印任何内容。)
另外,它不计算文本文件中的空行。因此8实际上应该是9,因为有一个空行它不计算。)
请告诉我如何解决此问题。使用defaultdict为每行创建行号列表:
from collections import defaultdict
def index(filename, lst):
with open(filename, 'r') as infile:
lines = [line.split() for line in infile]
word2linenumbers = defaultdict(list)
for linenumber, line in enumerate(lines, 1):
for word in line:
if word in lst:
word2linenumbers[word].append(linenumber)
return word2linenumbers
您还可以使用为每个单词启动新列表,或者在已找到该单词的情况下附加到现有列表:
def index(filename, lst):
# For larger lists, checking membership will be asymptotically faster using a set.
lst = set(lst)
dic = {}
with open(filename, 'r') as fobj:
for lineno, line in enumerate(fobj, 1):
words = line.split()
for word in words:
if word in lst:
dic.setdefault(word, []).append(lineno)
return dic
如果单词在同一行中出现多次,则使用集合而不是列表非常有用。您有两个主要问题可以通过以下方法解决: 1.)多个索引:您需要启动/分配一个列表作为dict值,而不仅仅是一个int。否则,每次找到包含该单词的新行时,将为每个单词重新分配一个新索引 2.)空行应该被理解为一行,所以我认为这只是一个索引问题。您的第一行被索引为
0
,因为范围中的第一个数字从0开始
您可以按如下方式简化程序:
def index (filename, lst):
wordinds = {key:[] for key in lst} #initiates an empty list for each word
with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
#the with statement is useful. trust.
for linenum,line in enumerate(infile):
for word in line.rstrip().split(): #strip new line and split into words
if word in wordinds:
wordinds[word].append(linenum)
return {x for x in wordinds.iteritems() if x[1]} #filters empty lists
这简化了一切,可以嵌套到一个为每行枚举的for
循环中。如果希望第一行为1
,第二行为2
,则必须将wordinds[word].append(linenum)
更改为…append(linenum+1)
编辑:有人在另一个答案中提出了一个很好的观点,让
枚举(infle,1)
从索引1开始枚举。这就更干净了。9可能是8,因为range()
函数返回一个从0开始的列表,所以行号计数是0索引的。词典中每个行号只有一个行号的原因是,您只将每个单词映射到一个行号。因此,将每个条目的值替换为最后一次看到该单词的行。如果您想要所有发生的事件,您需要映射到行号列表并不断追加到该列表。您可以使用enumerate(fobj,1)
代替lineno=i+1
def index (filename, lst):
wordinds = {key:[] for key in lst} #initiates an empty list for each word
with open(filename,'r') as infile: #why use filename param if you hardcoded the open....
#the with statement is useful. trust.
for linenum,line in enumerate(infile):
for word in line.rstrip().split(): #strip new line and split into words
if word in wordinds:
wordinds[word].append(linenum)
return {x for x in wordinds.iteritems() if x[1]} #filters empty lists