如何删除<；br/>；从Python爬行_Python_Web Crawler

如何删除<；br/>；从Python爬行

python web-crawler

如何删除<；br/>；从Python爬行,python,web-crawler,Python,Web Crawler,我第一次使用Python来爬网。但我不喜欢结果这是我的简单代码和结果，我想要的结果 **my code** from urllib.request import urlopen from bs4 import BeautifulSoup import re url = 'url' webpage = urlopen(url) source = BeautifulSoup(webpage, 'html5lib') reviews = source.find_all('p', {'cl

我第一次使用Python来爬网。但我不喜欢结果

这是我的简单代码和结果，我想要的结果

**my code**

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

url = 'url'

webpage = urlopen(url)

source = BeautifulSoup(webpage, 'html5lib')

reviews = source.find_all('p', {'class':'desc_review'})

print(reviews)

for review in reviews :
    print(review.get_text().strip())

此代码的输出结果如下所示

[<p class="desc_review"> a1 </p>, 
<p class="desc_review">  </p>, 
<p class="desc_review"> b1
<br/>b2
<br/>b3 </p>, 
<p class="desc_review">  c1
<br/>c2 </p>, 
<p class="desc_review"> d1 </p>, 
<p class="desc_review"> e1 </p>]

a1
b1
b2
b3
c1
c2
d1
e1

[a1，， b1 b2 b3， c1 c2， d1，

e1 a1 b1 b2 b3 c1 c2 d1 e1

但我想要的结果是这样的

**want result**
[<p class="desc_review"> a1 </p>, 
<p class="desc_review">  </p>, 
<p class="desc_review"> b1 b2 b3 </p>, 
<p class="desc_review">  c1 c2 </p>, 
<p class="desc_review"> d1 </p>, 
<p class="desc_review"> e1 </p>]

a1
b1 b2 b3
c1 c2
d1
e1

**想要结果吗**
[p class=“desc_review”>a1，
，
b1 b2 b3，
c1 c2，
d1，
e1
a1
b1 b2 b3
c1 c2
d1
e1

因此，我想排除br/。

方法是什么？

您是否尝试过将解析器更改为“html.parser”这样的简单解析器，并用空格替换br标记