Python 如何使用漂亮的汤刮P标签_Python_Web Scraping_Beautifulsoup_Tags

Python 如何使用漂亮的汤刮P标签

python web-scraping tags

Python 如何使用漂亮的汤刮P标签,python,web-scraping,beautifulsoup,tags,Python,Web Scraping,Beautifulsoup,Tags,我在BeautifulSoup中使用findAll函数，使用H2/Class/Div标记，成功地创建了一个网站。e、 g.findAll'div'，{'class'：'price'} 但网站有一部分有P标签，我不知道该怎么刮。它有下面的代码上市历史 <p class="top"> <strong>First listed</strong><br> 800 on 我想要800，但是Div Class边栏sbt和p

我在BeautifulSoup中使用findAll函数，使用H2/Class/Div标记，成功地创建了一个网站。e、 g.findAll'div'，{'class'：'price'} 但网站有一部分有P标签，我不知道该怎么刮。它有下面的代码

上市历史

<p class="top">
    <strong>First listed</strong><br>
            800 on

我想要800，但是Div Class边栏sbt和p Class=top在网站上有几个条目。任何帮助都将不胜感激

谢谢

您可以找到p标签，就像使用BeautifulSoup查找任何其他标签一样：

>>> from bs4 import BeautifulSoup as BS
>>> with open('html', 'r') as f:
...     soup = BS(f, "lxml")
... 
>>> soup.find_all('p', attrs={'class':'top'})
[<p class="top">
<strong>First listed</strong><br/>
            800 on
</p>]

如果真实情况与示例一样

试着这样做：

from bs4 import BeautifulSoup
>>> html = """<div class="price">

 <p class="top">
     <strong>First listed</strong><br>
             800 on
 </p>
 <p class="top">
     <strong>First listed</strong><br>
             900 on
 </p>
 <p class="top">
     <strong>First listed</strong><br>
             1000 on
 </p>

 </div>"""
>>> soup = BeautifulSoup(html)
>>> div = soup.find_all('div', class_'price')
>>> for p_tag in div:
""" will search for all p tags in the div"""
...    p = p_tag.find('p', class_='top').text.split()[-2] 
""" will split the example with spaces and will make a list of result. if you want only the 800 use [-2]""" 
...    print(p)        
# 800
# 900
# 1000

也许可以尝试使用print.joinsoup.p.find'br'.strip.split[0]来查找，但这不起作用。我不想打印这个项目。我想定义它，例如item_a=soup.findAll。。。那么do plant=item_a.get_text我该如何为上面的P标签执行此操作？好的，对不起，我不明白您到底想要什么。我再试一次？是否要将soup.find_all的结果分配给变量？也许item\u a=soup.p.查找'br'打印item\u a.获取\u文本？返回u'\n 800 on\n'的文本？

from bs4 import BeautifulSoup
>>> html = """<div class="price">

 <p class="top">
     <strong>First listed</strong><br>
             800 on
 </p>
 <p class="top">
     <strong>First listed</strong><br>
             900 on
 </p>
 <p class="top">
     <strong>First listed</strong><br>
             1000 on
 </p>

 </div>"""
>>> soup = BeautifulSoup(html)
>>> div = soup.find_all('div', class_'price')
>>> for p_tag in div:
""" will search for all p tags in the div"""
...    p = p_tag.find('p', class_='top').text.split()[-2] 
""" will split the example with spaces and will make a list of result. if you want only the 800 use [-2]""" 
...    print(p)        
# 800
# 900
# 1000