Python 计算给定文本中每个单词的频率_Python_String

Python 计算给定文本中每个单词的频率

python string

Python 计算给定文本中每个单词的频率,python,string,Python,String,我正在寻找一个python程序，它计算文本中每个单词的频率，并在出现的地方输出每个单词的计数和行号。我们将单词定义为非空白字符的连续序列。（提示：split（））注：同一字符序列的不同大写字母应视为同一单词，例如Python和Python，I和I 输入将是几行，空行终止文本。输入中将只显示字母字符和空格输出的格式如下：每行开头都有一个数字，表示单词的频率，一个空格，然后是单词本身，以及包含该单词的行号列表样本输入 Python is a cool language but OCaml

我正在寻找一个python程序，它计算文本中每个单词的频率，并在出现的地方输出每个单词的计数和行号。
我们将单词定义为非空白字符的连续序列。（提示：

split（）

）

注：同一字符序列的不同大写字母应视为同一单词，例如Python和Python，I和I

输入将是几行，空行终止文本。输入中将只显示字母字符和空格

输出的格式如下：
每行开头都有一个数字，表示单词的频率，一个空格，然后是单词本身，以及包含该单词的行号列表

样本输入

Python is a cool language but OCaml
is even cooler since it is purely functional

样本输出

3 is 1 2
1 a 1
1 but 1
1 cool 1
1 cooler 2
1 even 2
1 functional 2
1 it 2
1 language 1
1 ocaml 1
1 purely 2
1 python 1
1 since 2

附言。

我不是一个学生，我正在自学Python。

频率表通常最好用一个

使用，以及：

输出：

1 since [2] 3 is [1, 2] 1 a [1] 1 it [2] 1 but [1] 1 purely [2] 1 cooler [2] 1 functional [2] 1 Python [1] 1 cool [1] 1 language [1] 1 even [2] 1 OCaml [1] 3 is [1, 2] 1 a [1] 1 but [1] 1 cool [1] 1 cooler [2] 1 even [2] 1 functional [2] 1 it [2] 1 language [1] 1 OCaml [1] 1 purely [2] 1 Python [1] 1 since [2] 输出：

好的，您已经识别了split，可以将字符串转换为单词列表。但是，您希望列出每个单词出现的行，因此应该先将字符串拆分为行，然后再拆分为单词。然后，您可以创建一个字典，其中键是单词（首先放为小写），值可以是包含出现次数和出现行数的结构

您可能还需要输入一些代码来检查某个单词是否有效（例如，它是否包含数字），并对某个单词进行清理（删除标点符号）。我把这些留给你

def wsort(item):
    # sort descending by count, then ascending alphabetically
    word, freq = item
    return -freq['count'], word

def wfreq(str):
    words = {}

    # split by line, then by word
    lines = [line.split() for line in str.split('\n')]

    for i in range(len(lines)):
        for word in lines[i]:
            # if the word is not in the dictionary, create the entry
            word = word.lower()
            if word not in words:
                words[word] = {'count':0, 'lines':set()}

            # update the count and add the line number to the set
            words[word]['count'] += 1
            words[word]['lines'].add(i+1)

    # convert from a dictionary to a sorted list using wsort to give the order
    return sorted(words.iteritems(), key=wsort)

inp = "Python is a cool language but OCaml\nis even cooler since it is purely functional"

for word, freq in wfreq(inp):
    # generate the desired list format
    lines = " ".join(str(l) for l in list(freq['lines']))
    print "%i %s %s" % (freq['count'], word, lines)

这将提供与示例中完全相同的输出：

3 is 1 2
1 a 1
1 but 1
1 cool 1
1 cooler 2
1 even 2
1 functional 2
1 it 2
1 language 1
1 ocaml 1
1 purely 2
1 python 1
1 since 2

首先，找出课文中出现的所有单词。使用

split（）

如果文本存在于一个文件中，那么我们将首先将其转换为一个字符串，并将其全部转换为

text

。同时从文本中删除所有

\n

filin=open('file','r')
di = readlines(filin)

text = ''
for i in di:
     text += i</pre></code>

现在我们有了一个字典，其中单词作为键，值是单词在文本中出现的次数

filin=open('file','r')
di = readlines(filin)

text = ''
for i in di:
     text += i</pre></code>

现在，关于行号：

dicts2 = {}
for i in words_list:
     dicts2[i] = 0
filin.seek(0)
for i in word_list:
    filin.seek(0)
    count = 1
    for j in filin:
        if i in j:
            dicts2[i] += (count,)
         count += 1

现在，dicts2将单词作为键，并将它所在的行号列表作为值。元组内部

如果数据已在字符串中，则只需删除所有

\n

di=split（包含文本的字符串“\n”）

其他一切都一样

我相信您可以格式化输出。

您的问题是什么？一个计算文本中每个单词频率的程序，并在出现的地方输出每个单词的计数和行号。您自己尝试过吗？如果是这样的话，发布你的代码并解释你被困的地方。如果你证明自己努力过，人们往往会表现得更好，并给予你回答。提示：使用-语句和集合-模块检查

。我们应该如何使用提示：`split（）？您是如何遇到这个问题的？我对此投了反对票，因为这不是一个真正的问题--您只是要求提供完整的代码，准备运行。你还没试着自己做呢。这是一个“帮助”论坛，不是一个代码工厂。line.split（）
默认使用空格，因此不需要“部分。但是你也许应该等着给出一个完整的答案，因为OP从来没有表现出任何努力。很好的回答：-）很抱歉格式不好。我不是在电脑上。我希望你能理解。如果没有，请留下评论。我修复了你的格式，现在你必须接受它…+1的伟大评论。我如何才能将此结果写入csv文件？
dicts2 = {}
for i in words_list:
     dicts2[i] = 0
filin.seek(0)
for i in word_list:
    filin.seek(0)
    count = 1
    for j in filin:
        if i in j:
            dicts2[i] += (count,)
         count += 1