Python：按字母顺序打印所有术语的计数，即使为零_Python

Python：按字母顺序打印所有术语的计数，即使为零

python

Python：按字母顺序打印所有术语的计数，即使为零,python,Python,我在360+txt文件中运行一个循环，计算每个文件中某些单词的出现次数。代码如下： >>> cnt=Counter() >>> def process(filename): words=re.findall('\w+',open(filename).read().lower()) for word in words: if word in words_fra: cnt[word]+=1 if word in words_1:

我在360+txt文件中运行一个循环，计算每个文件中某些单词的出现次数。代码如下：

>>> cnt=Counter()
>>> def process(filename):
words=re.findall('\w+',open(filename).read().lower())
for word in words:
    if word in words_fra:
        cnt[word]+=1
    if word in words_1:
        cnt[word]+=1
print cnt
    cnt.clear()

>>> for filename in os.listdir("C:\Users\Cameron\Desktop\Project"):
process(filename)

我有两个列表，单词_fra和单词_1，每个列表中大约有10-15个单词。这会输出计数为零的匹配单词，但不会打印计数为零的单词，而是按频率顺序列出单词

输出示例：

Counter({'prices': 140, 'inflation': 107, 'labor': 46, 'price': 34, 'wage': 27,     'productivity': 26, 'capital': 21, 'workers': 20, 'wages': 19, 'employment': 18, 'investment': 14, 'unemployment': 13, 'construction': 13, 'production': 11, 'inflationary': 10, 'housing': 8, 'credit': 8, 'job': 7, 'industry': 7, 'jobs': 6, 'worker': 4, 'tax': 2, 'income': 2, 'aggregates': 1, 'payments': 1})
Counter({'inflation': 193, 'prices': 118, 'price': 97, 'labor': 58, 'unemployment': 42, 'wage': 32, 'productivity': 32, 'construction': 22, 'employment': 18, 'wages': 17, 'industry': 17, 'investment': 16, 'income': 16, 'housing': 15, 'production': 13, 'job': 13, 'inflationary': 12, 'workers': 9, 'aggregates': 9, 'capital': 5, 'jobs': 5, 'tax': 4, 'credit': 3, 'worker': 2})

我对格式没问题，只是我需要显示所有的字数，即使是零，而且我需要以字母顺序而不是频率返回字数

我可以在代码上附加什么来实现这一点？我最好能将它转换成一个漂亮的csv格式，将单词作为列标题，将计数作为行值

谢谢

编辑：顶部是当前输出的样子。底部是我想要的样子

Wordlist="a b c d"
Counter({'c': 4, 'a': 3, 'b':1})
Counter({'a': 3, 'b': 1, 'c': 4, 'd': 0})

要打印单词列表中的所有单词，可以在开始查找文件中的单词之前循环单词列表中的单词，并将它们添加到结果字典中，以0作为计数

要按正确的顺序打印，请使用内置的

大概是这样的：

import re

wordlist = words_fra + words_1
cnt = {}
for word in wordlist:
    cnt[word] = 0

words=re.findall('\w+',open('foo.html').read().lower())
for word in words:
    if word in wordlist:
        cnt[word]+=1

for result in sorted(cnt.items()):
    print("{0} appeared {1} times".format(*result))

如果要排序以使最常用的单词排在第一位，请执行以下操作：

for result in sorted(cnt.items(), key=lambda x:x[1]):
     print("{0} appeared {1} times".format(*result))

如果您想得到

计数器

，则必须重写

计数器

的

添加方法

以接受

。例如

In [8]: from collections import  Counter

In [9]: Counter({'red': 4, 'blue': 2,'white':0})+Counter({'red': 4, 'blue': 2,'white':0})
Out[9]: Counter({'red': 8, 'blue': 4})

In [10]: 
    ...: class Counter(Counter):
    ...:     def __add__(self, other):
    ...:         if not isinstance(other, Counter):
    ...:             return NotImplemented
    ...:         result = Counter()
    ...:         for elem, count in self.items():
    ...:             newcount = count + other[elem]
    ...:             result[elem] = newcount
    ...:         for elem, count in other.items():
    ...:             if elem not in self:
    ...:                 result[elem] = count
    ...:         return result
    ...:     

In [11]: Counter({'red': 4, 'blue': 2,'white':0})+Counter({'red': 4, 'blue': 2,'white':0})
Out[11]: Counter({'red': 8, 'blue': 4, 'white': 0}) #<-- now you see that `0` has been added to the resultant Counter

[8]中的

：来自集合导入计数器
在[9]中：计数器（{'red'：4，'blue'：2，'white'：0}）+计数器（{'red'：4，'blue'：2，'white'：0}）
Out[9]：计数器（{'red'：8，'blue'：4}）
在[10]中：
…：类计数器（计数器）：
…：定义添加（自身、其他）：
…：如果不存在（其他，计数器）：
…：返回未实现
…：结果=计数器（）
…：对于元素，在self.items（）中计数：
…：newcount=计数+其他[elem]
…：结果[elem]=newcount
…：对于元素，计入其他.items（）：
…：如果元素不在self中：
…：结果[元素]=计数
…：返回结果
...:     
在[11]中：计数器（{'red'：4，'blue'：2，'white'：0}）+计数器（{'red'：4，'blue'：2，'white'：0}）
Out[11]：计数器（{'red'：8，'blue'：4，'white'：0}）#嗯…'所有的字都算数？这是否意味着至少有一个找到的文件（但不是您正在查看的文件）中出现的单词？抱歉，我需要每个计数器输出（每个文件一个）按顺序列出每个单词及其计数，即使是零。要将0值单词添加到中，您可以执行类似于counter.fromkeys（所有单词，0）的操作+result_counter是的，在这里使用counter是个好主意，但这显然是一个初学者的问题，在没有stdlib帮助的情况下使用它也可能是个好主意。：-）我正在使用counter，很抱歉没有提及。这会改变我对结果的排序方式吗？现在，我只是打印cnt@CoS：它不会改变排序方式，但显然计数器不能计数为零。在这种情况下，我建议您删除计数器，因为它并没有真正的帮助。我应该把它放在代码中，而不是打印cnt，对吗？
for word in sorted(words_fra + words_1):
    print word, cnt[word]