如何在python中获取列表值和计数_Python_List_Arraylist_Stop Words

如何在python中获取列表值和计数

python list

如何在python中获取列表值和计数,python,list,arraylist,stop-words,Python,List,Arraylist,Stop Words,我正在尝试对列表中的每个单词进行计数。这样我就可以删除那些具有更大计数值的单词。但是我得到的输出是不正确的。假设我的文件中有这样几行“这是最好的时代，这是最坏的时代。这是智慧的时代，这是愚蠢的时代”。我的代码所做的是打印（was，4），然后再打印（was，3），依此类推。每次单词出现时，它都打印单词，但具有不同的计数值。我需要对每个单词进行一次计数 for file in files: print(file) f=open(file, 'r') content =

我正在尝试对列表中的每个单词进行计数。这样我就可以删除那些具有更大计数值的单词。但是我得到的输出是不正确的。假设我的文件中有这样几行“这是最好的时代，这是最坏的时代。这是智慧的时代，这是愚蠢的时代”。我的代码所做的是打印（was，4），然后再打印（was，3），依此类推。每次单词出现时，它都打印单词，但具有不同的计数值。我需要对每个单词进行一次计数

for file in files:  
    print(file)
    f=open(file, 'r')
    content = f.read() 
    wordlist = content.split()
    #print(wordlist)
    wordfreq = [wordlist.count(w) for w in wordlist] # a list comprehension
    print("List\n" + str(wordlist) + "\n")
    print("Frequencies\n" + str(wordfreq) + "\n")
    test = [i for i in wordfreq if i > 100]
    print("result\n"+str(list(zip(test,wordlist))))

您可以这样使用：

如果您不想使用集合。计数器，您可以使用这样的字典：

>>> s = "it was the best of times it was the worst of times .it was the age of wisdom it was the age of foolishness"
>>> d = {}
>>> for word in s.split():
...     try:
...         d[word] += 1
...     except KeyError:
...         d[word] = 1
...
>>> d
{'of': 4, 'age': 2, 'it': 3, 'foolishness': 1, 'times': 2, 'worst': 1, '.it': 1, 'the': 4, 'wisdom': 1, 'was': 4, 'best': 1}

您可以使用

集合

中的

计数器

：

from collections import Counter
import itertools

for file in files:

    data = itertools.chain.from_iterable([i.strip('\n').split() for i in open(file)])

    the_counts = Counter(data)

    print("wordlist: {}".format(data))
    print("frequencies: {}".format(dict(the_count))
    test = [(a, b) for a, b in the_count.items() if b > 100]

不带计数器的溶液：

new = s.split(' ')
m=list()
for i in new:
 m.append((i , new.count(i)))
for i in set(m):
    print i
del m[:] # deleting list for using it again

输出：

('best', 1)  
('was', 4)   
('times', 2)  
('it', 3)  
('worst', 1)  
('.it', 1)  
('wisdom', 1)  
('foolishness', 1)  
('the', 4)     
('of', 4) 
('age', 2)

another test : 
 s = 'was was it was hello it was'
output :  
('hello', 1)  
('was', 4)  
('it', 2)

如果将数据保存到文件中，请使用以下命令：

s=""

with open('your-file-name', 'r') as r:
 s+=r.read().replace('\n', '') #reading multi lines

new = s.split(' ')
m=list()
for i in new:
 m.append((i , new.count(i)))
for i in set(m):
    print i
del m[:] # deleting list for using it ag

将熊猫作为pd导入
a=pd.Series（txt.split（））.value_counts（）.reset_index（）.rename（列={0:“计数”，“索引”：“单词”}）
a[a]。counts@user3778289如果你不想使用（咨询）模块，你可以简单地使用这个代码。谢谢，这很好。但它仍然多次给我这个词。比如（was，4），（times，4），再次（was，4）@user3778289，但在我的输出中（was=4）重复一次（Set）如果要删除重复的，请复制并粘贴我的代码并测试againit，它会给我重复的输出。就像我有一个大的输入文件一样。这句话只是一个例子，也许因为我使用了（m.append（）），它会让你重复，因为每次测试你的程序数据时，都会得到append in（m）您应该清空列表并重试这给了我一个错误。test=[（a，b）用于dict中的a，b（计数）。items（）如果b>10]语法错误：无效syntax@user3778289请再试一次，让我知道发生了什么情况。测试=[（a，b）对于\u count中的a，b。items（），如果b>10]^SyntaxError:仍然存在无效语法错误
('best', 1)  
('was', 4)   
('times', 2)  
('it', 3)  
('worst', 1)  
('.it', 1)  
('wisdom', 1)  
('foolishness', 1)  
('the', 4)     
('of', 4) 
('age', 2)

another test : 
 s = 'was was it was hello it was'
output :  
('hello', 1)  
('was', 4)  
('it', 2)  

s=""

with open('your-file-name', 'r') as r:
 s+=r.read().replace('\n', '') #reading multi lines

new = s.split(' ')
m=list()
for i in new:
 m.append((i , new.count(i)))
for i in set(m):
    print i
del m[:] # deleting list for using it ag

import pandas as pd
a = pd.Series(txt.split()).value_counts().reset_index().rename(columns={0:"counts","index":"word"})
a[a.counts<100]