如何计算平均单词&;python 2.7中文本文件中的句子长度
在过去的两周里,我一直在想你能帮忙吗 我试图计算文本文件中的平均字长和句子长度。我只是不能把我的头围绕着它。我刚刚开始使用在主文件中调用的函数 我的主文件看起来是这样的如何计算平均单词&;python 2.7中文本文件中的句子长度,python,python-2.7,Python,Python 2.7,在过去的两周里,我一直在想你能帮忙吗 我试图计算文本文件中的平均字长和句子长度。我只是不能把我的头围绕着它。我刚刚开始使用在主文件中调用的函数 我的主文件看起来是这样的 import Consonants import Vowels import Sentences import Questions import Words """ Vowels """ text = Vowels.fileToString("test.txt") x = Vowels.countVowels(te
import Consonants
import Vowels
import Sentences
import Questions
import Words
""" Vowels """
text = Vowels.fileToString("test.txt")
x = Vowels.countVowels(text)
print str(x) + " Vowels"
""" Consonats """
text = Consonants.fileToString("test.txt")
x = Consonants.countConsonants(text)
print str(x) + " Consonants"
""" Sentences """
text = Sentences.fileToString("test.txt")
x = Sentences.countSentences(text)
print str(x) + " Sentences"
""" Questions """
text = Questions.fileToString("test.txt")
x = Questions.countQuestions(text)
print str(x) + " Questions"
""" Words """
text = Words.fileToString("test.txt")
x = Words.countWords(text)
print str(x) + " Words"
我的一个函数文件如下所示:
def fileToString(filename):
myFile = open(filename, "r")
myText = ""
for ch in myFile:
myText = myText + ch
return myText
def countWords(text):
vcount = 0
spaces = [' ']
for letter in text:
if (letter in spaces):
vcount = vcount + 1
return vcount
我想知道如何计算作为导入函数的单词长度?我尝试在这里使用一些其他线程,但它们对我来说并不正确。我正在尝试为您提供一个算法
- 阅读文件,使用
,enumerate()
对其进行split()
循环,并检查它们如何以
endswith()结束。喜欢李>
对于ind,枚举中的单词(readlines.split()):
如果单词.endswith(“?”)
.....
如果word.endswith(“!”)
然后将它们放入dict中,使用ind
(索引)值和while
循环
obj = "Hey there! how are you? I hope you are ok."
dict1 = {}
for ind,word in enumerate(obj.split()):
dict1[ind]=word
x = 0
while x<len(dict1):
if "?" in dict1[x]:
print (list(dict1.values())[:x+1])
x += 1
你看,我真的把这些词切掉了,直到达到?
。所以我现在在列表中有一个句子(你可以把它改成!
)。我可以达到每个元素的长度,其余的都是简单的数学。您将找到每个元素长度的总和,然后将其除以该列表的长度。理论上,它将给出平均值
记住,这是算法。您确实需要更改这些代码以适应您的数据,关键点是
enumerate()
,endswith()
和dict
老实说,当你在匹配单词和句子之类的东西时,学习和使用正则表达式比仅仅依靠str.split
捕捉每个角落的情况要好得多
#text.txt
Here is some text. It is written on more than one line, and will have several sentences.
Some sentences will have their OWN line!
It will also have a question. Is this the question? I think it is.
演示:
要回答这个问题:
我想知道如何计算单词的长度作为一个整体
我导入的函数
注:这不考虑事情。还有!词尾。。等等如果您想自己制作脚本,这不适用,但我会使用NLTK。它有一些非常好的工具来处理非常长的文本
提供nltk的备忘单。你应该能够导入你的文本,得到句子作为一个大列表,并得到n克(长度为n的单词)的列表。然后你可以计算平均值。什么构成一个句子?它是以
或结尾吗?
或结尾代码>…?@PadraicCunningham是的,我有句子函数的全部三个do。对于单词,您可以拆分每行并对返回列表的长度求和,如果您提取了行,请执行相同的操作。不过,你会用到标点符号,所以如果你真的想要准确的计数,你需要抓住这些情况。我们不知道什么是辅音,元音,句子,等等,所以很难帮到你。但是请注意,fileToString
相当于将open(filename)作为myFile:return myFile.read()
#text.txt
Here is some text. It is written on more than one line, and will have several sentences.
Some sentences will have their OWN line!
It will also have a question. Is this the question? I think it is.
#!/usr/bin/python
import re
with open('test.txt') as infile:
data = infile.read()
sentence_pat = re.compile(r"""
\b # sentences will start with a word boundary
([^.!?]+[.!?]+) # continue with one or more non-sentence-ending
# characters, followed by one or more sentence-
# ending characters.""", re.X)
word_pat = re.compile(r"""
(\S+) # Words are just groups of non-whitespace together
""", re.X)
sentences = sentence_pat.findall(data)
words = word_pat.findall(data)
average_sentence_length = sum([len(sentence) for sentence in sentences])/len(sentences)
average_word_length = sum([len(word) for word in words])/len(words)
>>> sentences
['Here is some text.',
'It is written on more than one line, and will have several sentences.',
'Some sentences will have their OWN line!',
'It will also have a question.',
'Is this the question?',
'I think it is.']
>>> words
['Here',
'is',
'some',
'text.',
'It',
'is',
... ,
'I',
'think',
'it',
'is.']
>>> average_sentence_length
31.833333333333332
>>> average_word_length
4.184210526315789
def avg_word_len(filename):
word_lengths = []
for line in open(filename).readlines():
word_lengths.extend([len(word) for word in line.split()])
return sum(word_lengths)/len(word_lengths)