Python 计算文档中的平均行长度
我尝试使用beautifull soup计算文档的平均行长,但我意识到这并不简单 我试过类似的方法,结果很奇怪Python 计算文档中的平均行长度,python,beautifulsoup,Python,Beautifulsoup,我尝试使用beautifull soup计算文档的平均行长,但我意识到这并不简单 我试过类似的方法,结果很奇怪 self.average_line_length = np.mean([ len(br.text) for br in self.contents.find_all('br')]) 当我检查结果时,如: for s1 in my_doc.contents.find_all(re.compile('br')) : print s1,len(s1) 结果: <br>
self.average_line_length = np.mean([ len(br.text) for br in self.contents.find_all('br')])
当我检查结果时,如:
for s1 in my_doc.contents.find_all(re.compile('br')) :
print s1,len(s1)
结果:
<br> does not contain any document with the entity or if our practition-
<br> er has only selected a verbal descriptor of the compound not used
<br> within the documents. In fact, a query on ‘</br></br></br></br> **252**
通常必须是:
<br> does not contain any document with the entity or if our practition- **68**
<br> er has only selected a verbal descriptor of the compound not used **66**
<br> within the documents. In fact, a qu
“根据您的结果,您的html实际上如下所示:
....
<br>
does not contain any document with the entity or if our practition-
<br> er has only selected a verbal descriptor of the compound not used
<br> within the documents. In fact, a query on ‘
</br>
</br>
</br>
....
看到了吗?嵌套的br元素
BeautifulSoup查找三个嵌套的br元素。当您打印代码时,find_all结果中的第一个是最外层的“br”,它包含两个内部br。其文本属性为:
不包含实体的任何文件,或者如果我们的做法-
er只选择了一个未使用的复合词的口头描述词
在文件中。事实上,关于