如何在python中省略字典中不太常见的单词？_Python

如何在python中省略字典中不太常见的单词？

python

如何在python中省略字典中不太常见的单词？,python,Python,我有一本字典。我想从字典中删去计数为1的单词。我怎么做？有什么帮助吗？我想提取单词的二元模型？我怎么做 import codecs file=codecs.open("Pezeshki339.txt",'r','utf8') txt = file.read() txt = txt[1:] token=txt.split() count={} for word in token: if word not in count: count[word]=1 else:

我有一本字典。我想从字典中删去计数为1的单词。我怎么做？有什么帮助吗？我想提取单词的二元模型？我怎么做

import codecs
file=codecs.open("Pezeshki339.txt",'r','utf8')
txt = file.read()
txt = txt[1:]

token=txt.split()

count={}
for word in token:
    if word not in count:
      count[word]=1
    else:
      count[word]+=1
for k,v in count.items():
    print(k,v)

我可以编辑我的代码如下。但有一个问题：如何创建二元矩阵并使用addone方法使其平滑？我感谢任何与我的代码相匹配的建议

import nltk
from collections import Counter
import codecs
with codecs.open("Pezeshki339.txt",'r','utf8') as file:
    for line in file:
       token=line.split()

spl = 80*len(token)/100
train = token[:int(spl)]
test = token[int(spl):]
print(len(test))
print(len(train))
cn=Counter(train)
known_words=([word for word,v in cn.items() if v>1])# removes the rare words and puts them in a list
print(known_words)
print(len(known_words))
bigram=nltk.bigrams(known_words)
frequency=nltk.FreqDist(bigram)
for f in frequency:
     print(f,frequency[f])

使用计数器指令对单词进行计数，然后过滤项目。删除值为1的键：

from collections import Counter

import codecs
with codecs.open("Pezeshki339.txt",'r','utf8') as f:

    cn = Counter(word for line in f for word in line.split())

    print(dict((word,v )for word,v in cn.items() if v > 1 ))

如果您只想使用“使用列表组件”：

print([word for word,v in cn.items() if v > 1 ])

您不需要调用read，您可以一边走一边拆分每一行，如果您想删除标点符号，还需要删除：

from string import punctuation

cn = Counter(word.strip(punctuation) for line in file for word in line.split())

Padraic的解决方案非常有效。但这里有一个解决方案，它可以深入到代码下面，而不是完全重写代码：

newdictionary = {}
for k,v in count.items():
    if v != 1:
        newdictionary[k] = v

：-）西部最快的枪。@AmiTavory，有时；）@marysd，我会在am中看一看，今晚大脑在这里关闭。谢谢大家，谢谢Padraic。你的是最好的。这是我所需要的代码。Padraic的的确通常是最好的。还有最快的。@marysd，没问题，不客气。阿美，能给我写下来吗@事实上，我还有一个问题。我编辑了上面的问题。你能帮我解决吗？我提前感谢你的帮助。

newdictionary = {}
for k,v in count.items():
    if v != 1:
        newdictionary[k] = v