Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中获取列表值和计数_Python_List_Arraylist_Stop Words - Fatal编程技术网

如何在python中获取列表值和计数

如何在python中获取列表值和计数,python,list,arraylist,stop-words,Python,List,Arraylist,Stop Words,我正在尝试对列表中的每个单词进行计数。这样我就可以删除那些具有更大计数值的单词。但是我得到的输出是不正确的。 假设我的文件中有这样几行“这是最好的时代,这是最坏的时代。这是智慧的时代,这是愚蠢的时代”。我的代码所做的是打印(was,4),然后再打印(was,3),依此类推。每次单词出现时,它都打印单词,但具有不同的计数值。我需要对每个单词进行一次计数 for file in files: print(file) f=open(file, 'r') content =

我正在尝试对列表中的每个单词进行计数。这样我就可以删除那些具有更大计数值的单词。但是我得到的输出是不正确的。 假设我的文件中有这样几行“这是最好的时代,这是最坏的时代。这是智慧的时代,这是愚蠢的时代”。我的代码所做的是打印(was,4),然后再打印(was,3),依此类推。每次单词出现时,它都打印单词,但具有不同的计数值。我需要对每个单词进行一次计数

for file in files:  
    print(file)
    f=open(file, 'r')
    content = f.read() 
    wordlist = content.split()
    #print(wordlist)
    wordfreq = [wordlist.count(w) for w in wordlist] # a list comprehension
    print("List\n" + str(wordlist) + "\n")
    print("Frequencies\n" + str(wordfreq) + "\n")
    test = [i for i in wordfreq if i > 100]
    print("result\n"+str(list(zip(test,wordlist))))
您可以这样使用:

如果您不想使用集合。计数器,您可以使用这样的字典:

>>> s = "it was the best of times it was the worst of times .it was the age of wisdom it was the age of foolishness"
>>> d = {}
>>> for word in s.split():
...     try:
...         d[word] += 1
...     except KeyError:
...         d[word] = 1
...
>>> d
{'of': 4, 'age': 2, 'it': 3, 'foolishness': 1, 'times': 2, 'worst': 1, '.it': 1, 'the': 4, 'wisdom': 1, 'was': 4, 'best': 1}

您可以使用
集合
中的
计数器

from collections import Counter
import itertools

for file in files:

    data = itertools.chain.from_iterable([i.strip('\n').split() for i in open(file)])

    the_counts = Counter(data)

    print("wordlist: {}".format(data))
    print("frequencies: {}".format(dict(the_count))
    test = [(a, b) for a, b in the_count.items() if b > 100]

不带计数器的溶液:

new = s.split(' ')
m=list()
for i in new:
 m.append((i , new.count(i)))
for i in set(m):
    print i
del m[:] # deleting list for using it again
输出:

('best', 1)  
('was', 4)   
('times', 2)  
('it', 3)  
('worst', 1)  
('.it', 1)  
('wisdom', 1)  
('foolishness', 1)  
('the', 4)     
('of', 4) 
('age', 2)

another test : 
 s = 'was was it was hello it was'
output :  
('hello', 1)  
('was', 4)  
('it', 2)  
如果将数据保存到文件中,请使用以下命令:

s=""

with open('your-file-name', 'r') as r:
 s+=r.read().replace('\n', '') #reading multi lines

new = s.split(' ')
m=list()
for i in new:
 m.append((i , new.count(i)))
for i in set(m):
    print i
del m[:] # deleting list for using it ag
将熊猫作为pd导入
a=pd.Series(txt.split()).value_counts().reset_index().rename(列={0:“计数”,“索引”:“单词”})

a[a]。counts@user3778289如果你不想使用(咨询)模块,你可以简单地使用这个代码。谢谢,这很好。但它仍然多次给我这个词。比如(was,4),(times,4),再次(was,4)@user3778289,但在我的输出中(was=4)重复一次(Set)如果要删除重复的,请复制并粘贴我的代码并测试againit,它会给我重复的输出。就像我有一个大的输入文件一样。这句话只是一个例子,也许因为我使用了(m.append()),它会让你重复,因为每次测试你的程序数据时,都会得到append in(m)您应该清空列表并重试这给了我一个错误。test=[(a,b)用于dict中的a,b(计数)。items()如果b>10]语法错误:无效syntax@user3778289请再试一次,让我知道发生了什么情况。测试=[(a,b)对于\u count中的a,b。items(),如果b>10]^SyntaxError:仍然存在无效语法错误
('best', 1)  
('was', 4)   
('times', 2)  
('it', 3)  
('worst', 1)  
('.it', 1)  
('wisdom', 1)  
('foolishness', 1)  
('the', 4)     
('of', 4) 
('age', 2)

another test : 
 s = 'was was it was hello it was'
output :  
('hello', 1)  
('was', 4)  
('it', 2)  
s=""

with open('your-file-name', 'r') as r:
 s+=r.read().replace('\n', '') #reading multi lines

new = s.split(' ')
m=list()
for i in new:
 m.append((i , new.count(i)))
for i in set(m):
    print i
del m[:] # deleting list for using it ag
import pandas as pd
a = pd.Series(txt.split()).value_counts().reset_index().rename(columns={0:"counts","index":"word"})
a[a.counts<100]