Python 对单词进行计数，并对包含该N个单词的字符串进行排序_Python_Nlp_Counting

Python 对单词进行计数，并对包含该N个单词的字符串进行排序

python nlp

Python 对单词进行计数，并对包含该N个单词的字符串进行排序,python,nlp,counting,Python,Nlp,Counting,我有这样的记录： (bc9, de, viana=do=castelo) (bc9, tomar o, aeroporto=de=pedras=rubras) (arábia=saudita, em o, afeganistão) bc9: 2 times. (bc9, de, viana=do=castelo) (bc9, tomar o, aeroporto=de=pedras=rubras) afeganistão: 1 times. (arábia=saudita, em o

我有这样的记录：

(bc9, de, viana=do=castelo)
(bc9, tomar o, aeroporto=de=pedras=rubras)
(arábia=saudita, em o, afeganistão)

bc9: 2 times. 

(bc9, de, viana=do=castelo)
(bc9, tomar o, aeroporto=de=pedras=rubras)

afeganistão: 1 times. 

(arábia=saudita, em o, afeganistão)

我想数一数有多少次，哪一次出现的次数最多，如下所示：

(bc9, de, viana=do=castelo)
(bc9, tomar o, aeroporto=de=pedras=rubras)
(arábia=saudita, em o, afeganistão)

bc9: 2 times. 

(bc9, de, viana=do=castelo)
(bc9, tomar o, aeroporto=de=pedras=rubras)

afeganistão: 1 times. 

(arábia=saudita, em o, afeganistão)

逗号之间的单词也不应该被计算在内。这是代码，它输出了一些连接器，我将删除这些连接器，但在那之后，我考虑迭代输入，并按照单词出现的顺序对包含单词的句子进行分组

from Tkinter import Tk
from tkFileDialog import askopenfilename
import operator

Tk().withdraw() 
filename = askopenfilename() 
file = open(filename, "r+")
wordcount = {}
saida=open('saida.txt','w')
string = 'portugal] <civ> <*> prop m s @p<   ['
for line in file:
 line = line.replace("(", "")
 line = line.replace(")", "")
 line = line.replace(",", "")
 line = line.replace("=", " ")
 line = line.replace(string, "")
 saida.write(line)
saida.close()
file.close()
file=open("saida.txt","r")
for word in file.read().split():
 if word not in wordcount:
    wordcount[word] = 1
 else:
    wordcount[word] += 1
file.close
sorted_x = sorted(wordcount.items(), key=operator.itemgetter(1), reverse=True)
saida2=open('saida2.txt','w')
for key, value in sorted_x:
 saida2.write(key+':')
 saida2.write('\t')
 saida2.write(str(value) + '\n')
 print key, value

从Tkinter导入Tk
从tkFileDialog导入askopenfilename
进口经营者
Tk（）.draw（）
filename=askopenfilename（）
文件=打开（文件名为“r+”）
字数={}
saida=open（'saida.txt'，'w'）
字符串='葡萄牙]道具m s@p<['
对于文件中的行：
行=行。替换（“（”，“”）
行=行。替换（“）”，“”）
行=行。替换（“，”，“”）
行=行。替换（“=”，“”）
行=行。替换（字符串“”）
赛达，写（行）
赛达
file.close（）文件
文件=打开（“saida.txt”、“r”）
对于文件.read（）.split（）中的word：
如果word不在wordcount中：
字数[字]=1
其他：
字数[字]+=1
file.close
排序的（wordcount.items（），key=operator.itemgetter（1），reverse=True）
saida2=open（'saida2.txt'，'w'）
对于键，排序的_x中的值：
saida2.write（键+'：'）
saida2.write（'\t'）
saida2.write（str（value）+'\n'）
打印键、值

假设您有固定数量的条目（3）。应该有足够的代码让您开始。

对您来说很好。您迄今为止都尝试了什么？堆栈溢出不是一种代码编写服务。对不起，我已经计算了字数，并将它们存储在一个有序的数据包中，但我在迭代输入文件时遇到了问题，请发布您的代码，以及有关您在哪里使用的特定问题问题。事实上，我发现我的问题是阅读它，我需要阅读它（直到第一个逗号，从第二个逗号直到），有什么提示吗？恐怕我不明白你在找什么。也许可以用你想要的，你尝试过的，以及它如何失败的具体例子来回答一个新问题。