Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 计算文档中的平均行长度_Python_Beautifulsoup - Fatal编程技术网

Python 计算文档中的平均行长度

Python 计算文档中的平均行长度,python,beautifulsoup,Python,Beautifulsoup,我尝试使用beautifull soup计算文档的平均行长,但我意识到这并不简单 我试过类似的方法,结果很奇怪 self.average_line_length = np.mean([ len(br.text) for br in self.contents.find_all('br')]) 当我检查结果时,如: for s1 in my_doc.contents.find_all(re.compile('br')) : print s1,len(s1) 结果: <br>

我尝试使用beautifull soup计算文档的平均行长,但我意识到这并不简单

我试过类似的方法,结果很奇怪

self.average_line_length = np.mean([ len(br.text) for br in self.contents.find_all('br')])
当我检查结果时,如:

for s1 in my_doc.contents.find_all(re.compile('br')) :
    print s1,len(s1) 
结果:

<br> does not contain any document with the entity or if our practition-
<br> er has only selected a verbal descriptor of the compound not used 
<br> within  the  documents.  In  fact,  a  query  on  ‘</br></br></br></br> **252**
通常必须是:

<br> does not contain any document with the entity or if our practition- **68**
<br> er has only selected a verbal descriptor of the compound not used **66**
<br> within  the  documents.  In  fact,  a  qu

根据您的结果,您的html实际上如下所示:

....
<br> 
    does not contain any document with the entity or if our practition-
    <br> er has only selected a verbal descriptor of the compound not used 
        <br> within  the  documents.  In  fact,  a  query  on  ‘
        </br>
    </br>
</br>
....
看到了吗?嵌套的br元素

BeautifulSoup查找三个嵌套的br元素。当您打印代码时,find_all结果中的第一个是最外层的“br”,它包含两个内部br。其文本属性为:

不包含实体的任何文件,或者如果我们的做法- er只选择了一个未使用的复合词的口头描述词 在文件中。事实上,关于