在Python中,如何计算列表中的出现次数?
我是python新手,我想计算每个单词在所有文件中出现的次数。显示每个单词、出现的次数和出现的时间百分比。对列表进行排序,使最频繁的单词最先出现,最不频繁的单词最后出现。 我正在做一个小样本,我只知道一个文件,但我无法正常工作在Python中,如何计算列表中的出现次数?,python,Python,我是python新手,我想计算每个单词在所有文件中出现的次数。显示每个单词、出现的次数和出现的时间百分比。对列表进行排序,使最频繁的单词最先出现,最不频繁的单词最后出现。 我正在做一个小样本,我只知道一个文件,但我无法正常工作 from collections import defaultdict words = "apple banana apple strawberry banana lemon" d = defaultdict(int) for word in words.split
from collections import defaultdict
words = "apple banana apple strawberry banana lemon"
d = defaultdict(int)
for word in words.split():
d[word] += 1
使用您的代码,这里有一种更简洁的方法:
# Initializing Dictionary
d = {}
with open(sys.argv[1], 'r') as f:
# counting number of times each word comes up in list of words (in dictionary)
for line in f:
words = line.lower().split()
# Iterate over each word in line
for word in words:
if word not in d.keys():
d[word] = 1
else:
d[word]+=1
n_all_words = sum([k.values])
# Print percentage occurance
for k, v in d.items():
print(f'{k} occurs {v} times and is {(100*v/n_all_words):,.2f}% total of words.')
# Sort a dictionary using this useful solution
# https://stackoverflow.com/a/613218/10521959
import operator
sorted_d = sorted(d.items(), key=operator.itemgetter(1))
正如评论中提到的,这正是集合。计数器 从文档:
最简单的方法就是使用计数器功能:
from collections import Counter
c = Counter(words.split())
输出:
Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1})
要使单词排列整齐,或计数:
list(c.keys())
list(c.values())
或者把它放到一个普通的字典里:
dict(c.items())
或元组列表:
c.most_common()
如上所述,collections模块中的Counter类绝对是计数应用程序的首选
此解决方案还解决了使用fileinput.input方法对多个文件中的字进行计数的请求,以迭代命令行上指定的所有文件名的内容,或者如果命令行上未指定任何文件名,则将从STDIN(通常是键盘)读取
最后,它使用了一种更为复杂的方法,用正则表达式作为分隔符,将行拆分为“单词”。正如代码中所指出的,它将更优雅地处理缩略语,但是它会被撇号和单引号混淆
"""countwords.py
count all words across all files
"""
import fileinput
import re
import collections
# create a regex delimiter that is any character that is not 1 or
# more word character or an apostrophe, this allows contractions
# to be treated as a word (eg can't won't didn't )
# Caution: this WILL get confused by a line that uses apostrophe
# as a single quote: eg 'hello' would be treated as a 7 letter word
word_delimiter = re.compile(r"[^\w']+")
# create an empty Counter
counter = collections.Counter()
# use fileinput.input() to open and read ALL lines from ALL files
# specified on the command line, or if no files specified on the
# command line then read from STDIN (ie the keyboard or redirect)
for line in fileinput.input():
for word in word_delimiter.split(line):
counter[word.lower()] += 1 # count case insensitively
del counter[''] # handle corner case of the occasional 'empty' word
# compute the total number of words using .values() to get an
# generator of all the Counter values (ie the individual word counts)
# then pass that generator to the sum function which is able to
# work with a list or a generator
total = sum(counter.values())
# iterate through the key/value pairs (ie word/word_count) in sorted
# order - the lambda function says sort based on position 1 of each
# word/word_count tuple (ie the word_count) and reverse=True does
# exactly what it says = reverse the normal order so it now goes
# from highest word_count to lowest word_count
print("{:>10s} {:>8s} {:s}".format("occurs", "percent", "word"))
for word, count in sorted(counter.items(),
key=lambda t: t[1],
reverse=True):
print ("{:10d} {:8.2f}% {:s}".format(count, count/total*100, word))
示例输出:
$ python3 countwords.py
I have a dog, he is a good dog, but he can't fly
^D
occurs percent word
2 15.38% a
2 15.38% dog
2 15.38% he
1 7.69% i
1 7.69% have
1 7.69% is
1 7.69% good
1 7.69% but
1 7.69% can't
1 7.69% fly
以及:
看看什么不起作用?从诊断的角度来说,我不能正常工作是没有帮助的。如果引发异常,能否提供示例输入、预期输出和实际输出,包括回溯?这些是一个项目的基础。正如@RaySteam所说,collections.Counter是在实际代码中如何实现的,但是对于学习练习/家庭作业,您可能希望自己实现它。您使用的是什么版本的python@nubbyFor reference.strip在调用.split(不带参数)之前是不必要的;而sumd.values在没有列表理解的情况下工作得很好。python 3但是我修复了print语句,但仍然不工作,我不理解这一行,n_all_words=sum[v代表k.values中的v]这里的缩进正确吗?打开后没有任何缩进。。。行,我来修一下@kaya3
"""countwords.py
count all words across all files
"""
import fileinput
import re
import collections
# create a regex delimiter that is any character that is not 1 or
# more word character or an apostrophe, this allows contractions
# to be treated as a word (eg can't won't didn't )
# Caution: this WILL get confused by a line that uses apostrophe
# as a single quote: eg 'hello' would be treated as a 7 letter word
word_delimiter = re.compile(r"[^\w']+")
# create an empty Counter
counter = collections.Counter()
# use fileinput.input() to open and read ALL lines from ALL files
# specified on the command line, or if no files specified on the
# command line then read from STDIN (ie the keyboard or redirect)
for line in fileinput.input():
for word in word_delimiter.split(line):
counter[word.lower()] += 1 # count case insensitively
del counter[''] # handle corner case of the occasional 'empty' word
# compute the total number of words using .values() to get an
# generator of all the Counter values (ie the individual word counts)
# then pass that generator to the sum function which is able to
# work with a list or a generator
total = sum(counter.values())
# iterate through the key/value pairs (ie word/word_count) in sorted
# order - the lambda function says sort based on position 1 of each
# word/word_count tuple (ie the word_count) and reverse=True does
# exactly what it says = reverse the normal order so it now goes
# from highest word_count to lowest word_count
print("{:>10s} {:>8s} {:s}".format("occurs", "percent", "word"))
for word, count in sorted(counter.items(),
key=lambda t: t[1],
reverse=True):
print ("{:10d} {:8.2f}% {:s}".format(count, count/total*100, word))
$ python3 countwords.py
I have a dog, he is a good dog, but he can't fly
^D
occurs percent word
2 15.38% a
2 15.38% dog
2 15.38% he
1 7.69% i
1 7.69% have
1 7.69% is
1 7.69% good
1 7.69% but
1 7.69% can't
1 7.69% fly
$ python3 countwords.py text1 text2
occurs percent word
2 11.11% hello
2 11.11% i
1 5.56% there
1 5.56% how
1 5.56% are
1 5.56% you
1 5.56% am
1 5.56% fine
1 5.56% mark
1 5.56% where
1 5.56% is
1 5.56% the
1 5.56% dog
1 5.56% haven't
1 5.56% seen
1 5.56% him