Python 每句话有多少个单词,如何表达意思?

Python 每句话有多少个单词,如何表达意思?,python,Python,我有一个函数,希望返回(a)每个句子的单词数和(b)元组列表中每个句子的单词平均长度。我可以得到(a)。对于(b)我可以得到每个句子的总字符数,但不是平均数 我看过一些帖子(, 和 )但我不能把我的头绕在这最后一块上 我已经包括了几次失败的尝试 import statistics def sentence_num_and_mean(text): """ Output list of, per sentence, number of words and mean length of wo

我有一个函数,希望返回(a)每个句子的单词数和(b)元组列表中每个句子的单词平均长度。我可以得到(a)。对于(b)我可以得到每个句子的总字符数,但不是平均数

我看过一些帖子(, 和 )但我不能把我的头绕在这最后一块上

我已经包括了几次失败的尝试

import statistics

def sentence_num_and_mean(text):
    """ Output list of, per sentence, number of words and mean length of words """
    # Replace ! and ? with .
    for ch in ['!', '?']:
        if ch in text:
            text = text.replace(ch, '.')

    # Number of words per sentence
    num_words_per_sent =  [len(element) for element in (element.split() for element in text.split("."))]

    # Mean length of words per sentence

    # This gets sum of characters per sentence, so on the right track
    mean_len_words_per_sent = [len(w) for w in text.split('.')]

    # This gives me "TypeError: unsupported operand type(s) for /: 'int' and 'list'" error
    # when trying to get the denominator for the mean
    # A couple efforts
    #mean_len_words_per_sent = sum(num_words_per_sent) / [len(w) for w in text.split('.')]
    #mean_len_words_per_sent = [(num_words_per_sent)/statistics.mean([len(w) for w in text.split()])]

    # Return list zipped together
    return list(zip(num_words_per_sent, mean_len_words_per_sent))
驱动程序:

split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)
哪张照片

First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 33), (7, 35), (2, 15), (2, 17), (3, 15)]
首先,我需要去掉空格和句点,但暂时忽略这一点,如果我做的简单数学正确,应该是:

[(6, 5.5), (7, 5), (2, 7.5), (2, 8.5), (3, 5)]

很明显,你最终得到的是每个句子的正确单词数和每个单词的预期字符数(在删除空格和标点符号之前)。所以你所需要的就是前者除以后者

num_words_per_sent =  [len(element) for element in (element.split() for element in text.split("."))]

len_words_per_sent = [len(w) for w in text.split('.')]

return [(num,len_words/num) for num,len_words in zip(num_words_per_sent,len_words_per_sent)]


列表
mean\u len\u words\u per\u sent
可能应该是当前使用的
num\u characters\u per\u sent

然后,您可以遍历创建的两个列表,并将每个句子中的字符除以句子中的字数

mean_len_words_per_sent=[num_chars/num_word for num_chars,num_words in zip(num_characters_per_sent,num_words per_sent)]

如果您只想要字母,那么这应该可以:

def sentence_num_and_mean(text):
    # Replace ! and ? with .
    for ch in ['!', '?']:
        if ch in text:
            text = text.replace(ch, '.')

    output = []
    sentences = text.split(".")
    for sentence in sentences:
        words = [x for x in sentence.split(" ") if x]
        word_count = len(words)
        word_length = sum(map(len, words))
        word_mean = word_length / word_count
        output.append((word_count, word_mean))

    return output


split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)
输出:

First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 4.666666666666667), (7, 4.0), (2, 6.5), (2, 7.5), (3, 4.0)]
您可以使用来计算平均字长。在这里,您可以使用
map(len,句子.split())
来计算每个单词的长度

import statistics

def sentence_num_and_mean(text):
    punctuation = '?!'
    text = text.translate(str.maketrans(dict.fromkeys(punctuation, '.')))
    sentences = text.split('.')
    num_words_per_sent = [len(s.strip().split()) for s in sentences]
    mean_len_words_per_sent = [statistics.mean(map(len, s.strip().split())) for s in sentences]
    return list(zip(num_words_per_sent, mean_len_words_per_sent))

更好的变量名可以帮助您阐明如何表达这些想法。
text.split('.')
给你什么?句子列表(str)。 如果变量
中只有一个句子,名为
句子
,那么
句子.split()
将为您提供一个单词列表(str)。考虑到这些,这是很容易写的


mean\u len\u words\u per\u sent=[statistics.mean(len(word)表示句子中的单词。split())表示文本中的句子。split('.')]

您可能会发现一次解决一点这个问题更容易。例如,编写一个函数,获取原始文本并传回句子列表。然后编写另一个函数,它接受一个句子并传回该句子中的单词列表。然后考虑可以获取单词列表并返回单词计数和平均长度的函数。当你说平均字符数时,你的意思是包括空格,还是只包括字母?因为你的问题只包含字母,但你的预期输出包含空格…好吧,我的例子包括空格和标点符号,但最终我会删除它们,然后计算平均值。关于“废话废话”该怎么办它不会以任何标点符号结尾,但会是一个句子,因为它位于拆分的末尾-这有关系吗?我有一个可能的答案,但你一下子问了一大堆问题。我对句子进行拆分,然后对单词进行拆分,并为该句子生成(count,avg_len)。也许这可以被细化为一个关于为一句话生成报告的问题?这并不是他们真正得到的。它们有每个句子的字符数,当然包括空格。我的措辞很糟糕。我只是指他们期望的字符数。我认为问题只是如何得到他们所期望的答案,而不是解决删除空白的单独问题,他们可能想在知道如何得到平均值后自己尝试一下。将编辑我的答案,以便没有人感到困惑。谢谢