BR中的文本不能使用python beautifulsoup获取_Python_Web Scraping_Beautifulsoup

BR中的文本不能使用python beautifulsoup获取

python web-scraping

BR中的文本不能使用python beautifulsoup获取,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我想在br标签下获取div中的所有数据。但是，它只在第一个页面上获取文本 <div itemprop="description"> <p>Chars : </br>- test1 </br>- test2 </br>- test3 </p> </div> 输出： Chars -test1 我想获取所有文本br我没有lxml问题，请选择 from bs4 import BeautifulSoup as

我想在br标签下获取div中的所有数据。但是，它只在第一个页面上获取文本

<div itemprop="description">

<p>Chars :
</br>- test1 
</br>- test2 
</br>- test3
</p>

</div>

输出：

Chars
-test1

我想获取所有文本br

我没有lxml问题，请选择

from bs4 import BeautifulSoup as bs
html = '''
<div itemprop="description">

<p>Chars :
</br>- test1 
</br>- test2 
</br>- test3
</p>

</div>
'''
soup = bs(html, 'lxml')
data = [item.text.strip().replace('\n',' ') for item in soup.select('div[itemprop=description]')]
print(data)

从bs4导入美化组作为bs
html=“”
字符：

-test1

-test2

-test3

'''
soup=bs（html，“lxml”）
数据=[item.text.strip（）。替换汤中项目的（'\n'，''）。选择（'div[itemprop=description]'）]
打印（数据）

我对lxml和select没有问题

from bs4 import BeautifulSoup as bs
html = '''
<div itemprop="description">

<p>Chars :
</br>- test1 
</br>- test2 
</br>- test3
</p>

</div>
'''
soup = bs(html, 'lxml')
data = [item.text.strip().replace('\n',' ') for item in soup.select('div[itemprop=description]')]
print(data)

从bs4导入美化组作为bs
html=“”
字符：

-test1

-test2

-test3

'''
soup=bs（html，“lxml”）
数据=[item.text.strip（）。替换汤中项目的（'\n'，''）。选择（'div[itemprop=description]'）]
打印（数据）

检查这个问题众所周知，BS与

br

标签的交互非常奇怪。你的选择可能是1）删除br标记，如

str（soup）。替换（“
”，“）

，或者使用不同的解析器：

soup=BeautifulSoup（page，'lxml'）

（第二个选项对我很有效），谢谢它确实解决了我的问题。看看这个问题，众所周知，BS与

br

标记进行奇怪的交互。您可以选择1）删除br标记，如

str（soup）。替换（“
”，”）

，或者使用不同的解析器：

soup=BeautifulSoup（page，'lxml'）

（第二个选项对我很有效），谢谢它解决了我的问题。