如何计算平均单词&；python 2.7中文本文件中的句子长度_Python_Python 2.7

如何计算平均单词&；python 2.7中文本文件中的句子长度

python python-2.7

如何计算平均单词&；python 2.7中文本文件中的句子长度,python,python-2.7,Python,Python 2.7,在过去的两周里，我一直在想你能帮忙吗我试图计算文本文件中的平均字长和句子长度。我只是不能把我的头围绕着它。我刚刚开始使用在主文件中调用的函数我的主文件看起来是这样的 import Consonants import Vowels import Sentences import Questions import Words """ Vowels """ text = Vowels.fileToString("test.txt") x = Vowels.countVowels(te

在过去的两周里，我一直在想你能帮忙吗

我试图计算文本文件中的平均字长和句子长度。我只是不能把我的头围绕着它。我刚刚开始使用在主文件中调用的函数

我的主文件看起来是这样的

import Consonants
import Vowels
import Sentences
import Questions
import Words

""" Vowels """


text = Vowels.fileToString("test.txt")    
x = Vowels.countVowels(text)

print str(x) + " Vowels"

""" Consonats """

text = Consonants.fileToString("test.txt")    
x = Consonants.countConsonants(text)


print str(x) + " Consonants"

""" Sentences """


text = Sentences.fileToString("test.txt")    
x = Sentences.countSentences(text)
print str(x) + " Sentences"


""" Questions """

text = Questions.fileToString("test.txt")    
x = Questions.countQuestions(text)

print str(x) + " Questions"

""" Words """
text = Words.fileToString("test.txt")    
x = Words.countWords(text)

print str(x) + " Words"

我的一个函数文件如下所示：

def fileToString(filename):
    myFile = open(filename, "r")
    myText = ""
    for ch in myFile:
        myText = myText + ch
    return myText

def countWords(text):
    vcount = 0
    spaces = [' ']
    for letter in text:
        if (letter in spaces):
            vcount = vcount + 1
    return vcount

我想知道如何计算作为导入函数的单词长度？我尝试在这里使用一些其他线程，但它们对我来说并不正确。

我正在尝试为您提供一个算法

阅读文件，使用

enumerate（）

，

split（）

对其进行

循环，并检查它们如何以endswith（）结束。喜欢


对于ind，枚举中的单词（readlines.split（））：
如果单词.endswith（“？”）
.....
如果word.endswith（“！”）

然后将它们放入dict中，使用ind
（索引）值和while
循环
obj = "Hey there! how are you? I hope you are ok."
dict1 = {}
for ind,word in enumerate(obj.split()):
    dict1[ind]=word

x = 0
while x<len(dict1):
    if "?" in dict1[x]:
        print (list(dict1.values())[:x+1])
    x += 1

你看，我真的把这些词切掉了，直到达到？
。所以我现在在列表中有一个句子（你可以把它改成！
）。我可以达到每个元素的长度，其余的都是简单的数学。您将找到每个元素长度的总和，然后将其除以该列表的长度。理论上，它将给出平均值
记住，这是算法。您确实需要更改这些代码以适应您的数据，关键点是enumerate（）
，endswith（）
和dict
 老实说，当你在匹配单词和句子之类的东西时，学习和使用正则表达式比仅仅依靠str.split
捕捉每个角落的情况要好得多
#text.txt
Here is some text. It is written on more than one line, and will have several sentences.

Some sentences will have their OWN line!

It will also have a question. Is this the question? I think it is.


演示：
要回答这个问题：
我想知道如何计算单词的长度作为一个整体
我导入的函数
注：这不考虑事情。还有！词尾。。等等
如果您想自己制作脚本，这不适用，但我会使用NLTK。它有一些非常好的工具来处理非常长的文本
提供nltk的备忘单。你应该能够导入你的文本，得到句子作为一个大列表，并得到n克（长度为n的单词）的列表。然后你可以计算平均值。
什么构成一个句子？它是以或结尾吗？
或结尾…？@PadraicCunningham是的，我有句子函数的全部三个do。对于单词，您可以拆分每行并对返回列表的长度求和，如果您提取了行，请执行相同的操作。不过，你会用到标点符号，所以如果你真的想要准确的计数，你需要抓住这些情况。我们不知道什么是辅音，元音，句子，等等，所以很难帮到你。但是请注意，fileToString
相当于将open（filename）作为myFile:return myFile.read（）
#text.txt
Here is some text. It is written on more than one line, and will have several sentences.

Some sentences will have their OWN line!

It will also have a question. Is this the question? I think it is.

#!/usr/bin/python

import re

with open('test.txt') as infile:
    data = infile.read()

sentence_pat = re.compile(r"""
    \b                # sentences will start with a word boundary
    ([^.!?]+[.!?]+)   # continue with one or more non-sentence-ending
                      #    characters, followed by one or more sentence-
                      #    ending characters.""", re.X)

word_pat = re.compile(r"""
    (\S+)             # Words are just groups of non-whitespace together
    """, re.X)

sentences = sentence_pat.findall(data)
words = word_pat.findall(data)

average_sentence_length = sum([len(sentence) for sentence in sentences])/len(sentences)
average_word_length = sum([len(word) for word in words])/len(words)

>>> sentences
['Here is some text.',
 'It is written on more than one line, and will have several sentences.',
 'Some sentences will have their OWN line!',
 'It will also have a question.',
 'Is this the question?',
 'I think it is.']

>>> words
['Here',
 'is',
 'some',
 'text.',
 'It',
 'is',
 ... ,
 'I',
 'think',
 'it',
 'is.']

>>> average_sentence_length
31.833333333333332

>>> average_word_length
4.184210526315789

def avg_word_len(filename):
    word_lengths = []
    for line in open(filename).readlines():
        word_lengths.extend([len(word) for word in line.split()])
    return sum(word_lengths)/len(word_lengths)