Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中省略字典中不太常见的单词?_Python - Fatal编程技术网

如何在python中省略字典中不太常见的单词?

如何在python中省略字典中不太常见的单词?,python,Python,我有一本字典。我想从字典中删去计数为1的单词。我怎么做?有什么帮助吗?我想提取单词的二元模型?我怎么做 import codecs file=codecs.open("Pezeshki339.txt",'r','utf8') txt = file.read() txt = txt[1:] token=txt.split() count={} for word in token: if word not in count: count[word]=1 else:

我有一本字典。我想从字典中删去计数为1的单词。我怎么做?有什么帮助吗?我想提取单词的二元模型?我怎么做

import codecs
file=codecs.open("Pezeshki339.txt",'r','utf8')
txt = file.read()
txt = txt[1:]

token=txt.split()

count={}
for word in token:
    if word not in count:
      count[word]=1
    else:
      count[word]+=1
for k,v in count.items():
    print(k,v)
我可以编辑我的代码如下。但有一个问题:如何创建二元矩阵并使用addone方法使其平滑?我感谢任何与我的代码相匹配的建议

import nltk
from collections import Counter
import codecs
with codecs.open("Pezeshki339.txt",'r','utf8') as file:
    for line in file:
       token=line.split()

spl = 80*len(token)/100
train = token[:int(spl)]
test = token[int(spl):]
print(len(test))
print(len(train))
cn=Counter(train)
known_words=([word for word,v in cn.items() if v>1])# removes the rare words and puts them in a list
print(known_words)
print(len(known_words))
bigram=nltk.bigrams(known_words)
frequency=nltk.FreqDist(bigram)
for f in frequency:
     print(f,frequency[f])

使用计数器指令对单词进行计数,然后过滤项目。删除值为1的键:

from collections import Counter

import codecs
with codecs.open("Pezeshki339.txt",'r','utf8') as f:

    cn = Counter(word for line in f for word in line.split())

    print(dict((word,v )for word,v in cn.items() if v > 1 ))
如果您只想使用“使用列表组件”:

print([word for word,v in cn.items() if v > 1 ])
您不需要调用read,您可以一边走一边拆分每一行,如果您想删除标点符号,还需要删除:

from string import punctuation

cn = Counter(word.strip(punctuation) for line in file for word in line.split())

Padraic的解决方案非常有效。但这里有一个解决方案,它可以深入到代码下面,而不是完全重写代码:

newdictionary = {}
for k,v in count.items():
    if v != 1:
        newdictionary[k] = v

:-)西部最快的枪。@AmiTavory,有时;)@marysd,我会在am中看一看,今晚大脑在这里关闭。谢谢大家,谢谢Padraic。你的是最好的。这是我所需要的代码。Padraic的的确通常是最好的。还有最快的。@marysd,没问题,不客气。阿美,能给我写下来吗@事实上,我还有一个问题。我编辑了上面的问题。你能帮我解决吗?我提前感谢你的帮助。
newdictionary = {}
for k,v in count.items():
    if v != 1:
        newdictionary[k] = v