Python 计算一段文字中最常见的有名无实的词语_Python_Python 3.x

Python 计算一段文字中最常见的有名无实的词语

python python-3.x

Python 计算一段文字中最常见的有名无实的词语,python,python-3.x,Python,Python 3.x,我必须做一项任务，打开一个文本文件，然后计算每个单词大写的次数。然后我需要打印前3个匹配项。这段代码一直工作，直到它得到一个文本文件，其中的单词在一行中对折 txt文件1： Jellicle Cats are black and white, Jellicle Cats are rather small; Jellicle Cats are merry and bright, And pleasant to hear when they caterwaul. Jellicle Cats hav

我必须做一项任务，打开一个文本文件，然后计算每个单词大写的次数。然后我需要打印前3个匹配项。这段代码一直工作，直到它得到一个文本文件，其中的单词在一行中对折

txt文件1：

Jellicle Cats are black and white,
Jellicle Cats are rather small;
Jellicle Cats are merry and bright,
And pleasant to hear when they caterwaul.
Jellicle Cats have cheerful faces,
Jellicle Cats have bright black eyes;
They like to practise their airs and graces
And wait for the Jellicle Moon to rise.

结果:

6 Jellicle
5 Cats
2 And

1 Baa
1 One
1 Yes
1 Baa
1 One
1 Yes
1 Baa
1 One
1 Yes

txt文件2：

Baa Baa black sheep have you any wool?
Yes sir Yes sir, wool for everyone.
One for the master, 
One for the dame.
One for the little boy who lives down the lane.

结果:

6 Jellicle
5 Cats
2 And

1 Baa
1 One
1 Yes
1 Baa
1 One
1 Yes
1 Baa
1 One
1 Yes

这是我的密码：

wc = {}
t3 = {}
p = 0
xx=0
a = open('novel.txt').readlines()
for i in a:
  b = i.split()
  for l in b:
    if l[0].isupper():
      if l not in wc:
         wc[l] = 1
      else:
        wc[l] += 1
while p < 3:
  p += 1
  max_val=max(wc.values())
  for words in wc:
    if wc[words] == max_val:
      t3[words] = wc[words]
      wc[words] = 1

    else:
      null = 1
while xx < 3:
  xx+=1
  maxval = max(t3.values())
  for word in sorted(t3):
    if t3[word] == maxval:
      print(t3[word],word)
      t3[word] = 1
    else:
      null+=1

wc={}
t3={}
p=0
xx=0
a=打开（'novel.txt'）。读线（）
对于我来说，在一个：
b=i.拆分（）
对于b中的l：
如果l[0]。isupper（）：
如果我不在厕所：
wc[l]=1
其他：
wc[l]+=1
而p<3:
p+=1
max_val=max（wc.values（））
对于wc中的单词：
如果wc[字]==最大值：
t3[字数]=wc[字数]
wc[字]=1
其他：
null=1
而xx<3：
xx+=1
maxval=max（t3.values（））
对于排序中的单词（t3）：
如果t3[word]==maxval：
打印（t3[word]，word）
t3[字]=1
其他：
null+=1

请帮我解决这个问题。谢谢大家!

谢谢你的建议。在手动调试代码以及使用您的响应后，我发现

而xx<3:

是不必要的，并且

wc[words]=1

如果第三个最常出现的单词出现一次，则程序会重复计算单词。通过将其替换为

wc[words]=0

我能够避免出现计数循环

谢谢大家!

这非常简单。但你需要一些工具

re.sub

，以消除标点符号

filter

，使用

str.istitle

collections.Counter

，用于计算字数（首先从collections导入计数器执行

）



假设text
包含您的段落（第一个），则此操作有效：
In [296]: Counter(filter(str.istitle, re.sub('[^\w\s]', '', text).split())).most_common(3)
Out[296]: [('Jellicle', 6), ('Cats', 5), ('And', 2)]

Counter.most_common（x）
返回x
最常用的单词
巧合的是，这是第二段的输出：
[('One', 3), ('Baa', 2), ('Yes', 2)]

我建议您学习一些调试技巧。您可以将print（）
语句添加到代码中，以查看它在做什么。打印出代码中关键步骤的变量值。然后检查这些值是否符合预期。或者，您可以使用源代码级调试器。在计数显示{'Baa'：2'Yes'：2'One'：3}
之后，计数代码wc
没有问题，但是您的逻辑在我非常困惑之后-您试图如何处理2while
循环（注意：它们都应该是for
循环）。逻辑故障的原因是，如果小于3个大于1
的唯一值，则重置t3
，这在第二种情况下是问题所在。