Python对待带逗号的单词与字典中不带逗号的单词一样
我正在制作一个程序,读取一个文件并制作一个字典,显示一个单词被使用了多少次:Python对待带逗号的单词与字典中不带逗号的单词一样,python,python-3.x,Python,Python 3.x,我正在制作一个程序,读取一个文件并制作一个字典,显示一个单词被使用了多少次: filename = 'for_python.txt' with open(filename) as file: contents = file.read().split() dict = {} for word in contents: if word not in dict: dict[word] = 1 else: dict[word] += 1
filename = 'for_python.txt'
with open(filename) as file:
contents = file.read().split()
dict = {}
for word in contents:
if word not in dict:
dict[word] = 1
else:
dict[word] += 1
dict = sorted(dict.items(), key=lambda x: x[1], reverse=True)
for i in dict:
print(i[0], i[1])
它是有效的,但它将带有逗号的单词视为不同的单词,我不希望这样做。有没有一种简单而有效的方法可以做到这一点?在拆分逗号之前删除所有逗号
filename = 'for_python.txt'
with open(filename) as file:
contents = file.read().replace(",", "").split()
我建议您在使用
单词时使用不同的标点字符strip()
。也不要使用内置的dict
name,它是字典构造函数
import string
words = {}
for word in contents:
word = word.strip(string.punctuation)
if word not in words:
words[word] = 1
else:
words[word] += 1
如你所知,它存在于集合中。执行此任务的计数器
import string
from collections import Counter
filename = 'test.txt'
with open(filename) as file:
contents = file.read().split()
words = Counter(word.strip(string.punctuation) for word in contents)
for k, v in words.most_common(): # All content, in occurence conut order descreasingly
print(k, v)
for k, v in words.most_common(5): # Only 5 most occurrence
print(k, v)
您正在根据“
作为分隔符拆分整个数据,但对逗号不执行相同的操作。您可以使用逗号进一步拆分这些单词。以下是方法:
...
for word in contents:
new_words = word.split(',')
for new_word in new_words:
if new_word not in dict:
dict[new_word] = 1
else:
dict[new_word] += 1
...
你想避免所有穿刺吗?这能回答你的问题吗?请注意,这对像word1-word2、wrod3-word4这样的东西不起作用。中间部分仍将被视为word2、word3