Python 每句话有多少个单词，如何表达意思？_Python

Python 每句话有多少个单词，如何表达意思？

python

Python 每句话有多少个单词，如何表达意思？,python,Python,我有一个函数，希望返回（a）每个句子的单词数和（b）元组列表中每个句子的单词平均长度。我可以得到（a）。对于（b）我可以得到每个句子的总字符数，但不是平均数我看过一些帖子（，和 )但我不能把我的头绕在这最后一块上我已经包括了几次失败的尝试 import statistics def sentence_num_and_mean(text): """ Output list of, per sentence, number of words and mean length of wo

我有一个函数，希望返回（a）每个句子的单词数和（b）元组列表中每个句子的单词平均长度。我可以得到（a）。对于（b）我可以得到每个句子的总字符数，但不是平均数

我看过一些帖子（，和 )但我不能把我的头绕在这最后一块上

我已经包括了几次失败的尝试

import statistics

def sentence_num_and_mean(text):
    """ Output list of, per sentence, number of words and mean length of words """
    # Replace ! and ? with .
    for ch in ['!', '?']:
        if ch in text:
            text = text.replace(ch, '.')

    # Number of words per sentence
    num_words_per_sent =  [len(element) for element in (element.split() for element in text.split("."))]

    # Mean length of words per sentence

    # This gets sum of characters per sentence, so on the right track
    mean_len_words_per_sent = [len(w) for w in text.split('.')]

    # This gives me "TypeError: unsupported operand type(s) for /: 'int' and 'list'" error
    # when trying to get the denominator for the mean
    # A couple efforts
    #mean_len_words_per_sent = sum(num_words_per_sent) / [len(w) for w in text.split('.')]
    #mean_len_words_per_sent = [(num_words_per_sent)/statistics.mean([len(w) for w in text.split()])]

    # Return list zipped together
    return list(zip(num_words_per_sent, mean_len_words_per_sent))

驱动程序：

split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)

哪张照片

First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 33), (7, 35), (2, 15), (2, 17), (3, 15)]

首先，我需要去掉空格和句点，但暂时忽略这一点，如果我做的简单数学正确，应该是：

[(6, 5.5), (7, 5), (2, 7.5), (2, 8.5), (3, 5)]

很明显，你最终得到的是每个句子的正确单词数和每个单词的预期字符数（在删除空格和标点符号之前）。所以你所需要的就是前者除以后者

num_words_per_sent =  [len(element) for element in (element.split() for element in text.split("."))]

len_words_per_sent = [len(w) for w in text.split('.')]

return [(num,len_words/num) for num,len_words in zip(num_words_per_sent,len_words_per_sent)]

列表

mean\u len\u words\u per\u sent

可能应该是当前使用的

num\u characters\u per\u sent

然后，您可以遍历创建的两个列表，并将每个句子中的字符除以句子中的字数

mean_len_words_per_sent=[num_chars/num_word for num_chars，num_words in zip（num_characters_per_sent，num_words per_sent）]

如果您只想要字母，那么这应该可以：

def sentence_num_and_mean(text):
    # Replace ! and ? with .
    for ch in ['!', '?']:
        if ch in text:
            text = text.replace(ch, '.')

    output = []
    sentences = text.split(".")
    for sentence in sentences:
        words = [x for x in sentence.split(" ") if x]
        word_count = len(words)
        word_length = sum(map(len, words))
        word_mean = word_length / word_count
        output.append((word_count, word_mean))

    return output


split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)

输出：

First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 4.666666666666667), (7, 4.0), (2, 6.5), (2, 7.5), (3, 4.0)]

您可以使用来计算平均字长。在这里，您可以使用

map（len，句子.split（））

来计算每个单词的长度

import statistics

def sentence_num_and_mean(text):
    punctuation = '?!'
    text = text.translate(str.maketrans(dict.fromkeys(punctuation, '.')))
    sentences = text.split('.')
    num_words_per_sent = [len(s.strip().split()) for s in sentences]
    mean_len_words_per_sent = [statistics.mean(map(len, s.strip().split())) for s in sentences]
    return list(zip(num_words_per_sent, mean_len_words_per_sent))

更好的变量名可以帮助您阐明如何表达这些想法。

text.split（'.'）

给你什么？句子列表（str）。如果变量

中只有一个句子，名为句子
，那么句子.split（）
将为您提供一个单词列表（str）。考虑到这些，这是很容易写的
mean\u len\u words\u per\u sent=[statistics.mean（len（word）表示句子中的单词。split（））表示文本中的句子。split（'.'）]
您可能会发现一次解决一点这个问题更容易。例如，编写一个函数，获取原始文本并传回句子列表。然后编写另一个函数，它接受一个句子并传回该句子中的单词列表。然后考虑可以获取单词列表并返回单词计数和平均长度的函数。当你说平均字符数时，你的意思是包括空格，还是只包括字母？因为你的问题只包含字母，但你的预期输出包含空格…好吧，我的例子包括空格和标点符号，但最终我会删除它们，然后计算平均值。关于“废话废话”该怎么办它不会以任何标点符号结尾，但会是一个句子，因为它位于拆分的末尾-这有关系吗？我有一个可能的答案，但你一下子问了一大堆问题。我对句子进行拆分，然后对单词进行拆分，并为该句子生成（count，avg_len）。也许这可以被细化为一个关于为一句话生成报告的问题？这并不是他们真正得到的。它们有每个句子的字符数，当然包括空格。我的措辞很糟糕。我只是指他们期望的字符数。我认为问题只是如何得到他们所期望的答案，而不是解决删除空白的单独问题，他们可能想在知道如何得到平均值后自己尝试一下。将编辑我的答案，以便没有人感到困惑。谢谢