计算文本文件Python中特定单词的列表
我想计算特定单词连词的出现次数:也,虽然,和,as,因为,之前,但是,for,if,nor,of,or,since,that,tough,when,where,while,while,while,还有来自txt文件的标点符号 这就是我所做的:计算文本文件Python中特定单词的列表,python,Python,我想计算特定单词连词的出现次数:也,虽然,和,as,因为,之前,但是,for,if,nor,of,or,since,that,tough,when,where,while,while,while,还有来自txt文件的标点符号 这就是我所做的: def count(fname, words_list): if fname: try: file = open(str(fname), 'r') full_text = file.readlines()
def count(fname, words_list):
if fname:
try:
file = open(str(fname), 'r')
full_text = file.readlines()
file.close()
count_result = dict()
for word in words_list:
for text in full_text:
if word in count_result:
count_result[word] = count_result[word] + text.count(word)
else:
count_result[word] = text.count(word)
return count_result
except:
print('Something really bad just happened!')
print(count('sample2.txt', ["also", "although", "and", "as", "because", "before", "but", "for", "if", "nor", "of",
"or", "since", "that", "though", "until", "when", "whenever", "whereas",
"which", "while", "yet", ",", ";", "-", "'"]))
但它所做的是,它被算作,我该如何修复它,或者是否有其他方法来归档它?谢谢
预期输出类似于:
{'allow':0,'althouse':0,'and':27,'as':2,'before':2,'but':4,'for':2,'if':2,'nor':0,'of':13,'or':2,'before':0,'that':10,'虽然':2,'until':0,'when':3,'while':0,'while':0,'while':0,'yet':0,'yet':0,'yet','41,';3,'t':1,'17,'words"单词/句子:25.4286,'s在2.7和3.1中,对于您试图实现的目标有一个特殊的定义 因为您还没有发布任何示例输出。我想给你一个方法,你可以使用。维护一个列表。在列表中添加您需要的单词。例如,如果您也接近该单词,请将其附加到列表中
>>> l.append("also")
>>> l
['also']
同样,您会遇到单词“尽管”,列表变成:
>>> l.append("although")
>>> l
['also', 'although']
['also', 'although', 'also']
如果您再次遇到,请再次将其添加到上面的列表中
该列表变为:
>>> l.append("although")
>>> l
['also', 'although']
['also', 'although', 'also']
现在使用计数器计算列表元素的出现次数:
>>> l = ['also', 'although', 'also']
>>> result = Counter(l)
>>> l
['also', 'although', 'also']
>>> result
Counter({'also': 2, 'although': 1})
str.countsub函数计算子字符串sub。当您希望它计算为时,它在单词was中找到它,然后在找到时增加子字符串的数量
您可以在这里使用regex来指定希望作为完整单词,而不是另一个单词的子字符串。标志\b表示一个单词的结尾。我会做一些简单的事情,比如检查文件中的单词,并检查它们是否都在要计数的单词列表中。在这种情况下,在该单词的输入处向计数器字典添加1
# get all the words in the file
word_list_in_text = file.read().split()
count_result={}
for word in words_list_in_text:
#check if each word in the file is in your target list
if word is in word_list:
if word not in count_result:
count_result[word] = 1
else:
count_result[word] += 1
print(count_result)
如果要在一个函数中执行此操作:
def word_count(fname, word_list, punctaction):
count_w = dict()
for w in word_list:
count_w[w] = 0
count_p = dict()
for p in punctaction:
count_p[p] = 0
with open(fname) as input_text:
text = input_text.read()
words = text.lower().split()
for word in words:
_word = word.strip('.,:-)()')
if _word in count_w:
count_w[_word] +=1
for c in text:
if c in punctaction:
count_p[c] +=1
count_w.update(count_p)
return count_w
print(word_count('c_prog.txt', ["also", "although", "and", "as", "because", "before", "but", "for", "if", "nor", "of", "or", "since", "that",
"though", "until", "when", "whenever", "whereas", "which", "while", "yet"], [",", ";", "-", "'"]))
样本输入和预期输出?我在那里上传了3个样本,用于测试预期输出如下:{'allow':0,'althouse':0,'and':27,'as':2,'before':2,'but':4,'for':2,'if':2,'nor':0,'of':13,'or':2,'before':0,'that':10,'虽然':2,'until':0,'when':3,'while':0,'while':0,'while':0,'yet':0,'yet':0,'yet','41,';3,'t':1,'17,'words"单词/句子:25.4286,'s我如何组合这两个函数,以便获得预期回报?