使用Python和Beauty Soup仅从页面上的div标记中提取文本_Python_Html_Css_Web Scraping_Beautifulsoup

使用Python和Beauty Soup仅从页面上的div标记中提取文本

python html css web-scraping

使用Python和Beauty Soup仅从页面上的div标记中提取文本,python,html,css,web-scraping,beautifulsoup,Python,Html,Css,Web Scraping,Beautifulsoup,我试图刮一个静态新闻网站作为一个项目，我使用美丽的汤，但我被困在一个页面，其中包含在div标签文本，这里的文本意味着新闻文章该网站的链接是新闻文本包含在以下格式中 <html> <body> <div class="normal" id="foo"> " Many " <a href ='/some link' target = 'blank'>Bollywood</a> " stars today are av

我试图刮一个静态新闻网站作为一个项目，我使用美丽的汤，但我被困在一个页面，其中包含在div标签文本，这里的文本意味着新闻文章

该网站的链接是

新闻文本包含在以下格式中

<html>
<body>
<div class="normal" id="foo">
      " Many "
 <a href ='/some link' target = 'blank'>Bollywood</a>
 " stars today  are avowed foodies "
 <a href = 'link2'>Ranbir Kapoor</a>
 " Alia Bhat "
</div>
</body>
</html>


“很多”
“今天的明星是公认的美食家”
“Alia Bhat”

我想要的文字是“今天许多宝莱坞明星都是宣誓美食家。Alia Bhat”

那就是我想要所有的文本，无论它们在哪里

我能够使用find_all（'div'，'normal'）到达div，但在那之后，我一直在思考如何从页面检索所有文本元素

如果需要更多信息，请告诉我。

要从BeautifulGroup中的某个元素提取

文本

，可以使用

.text

属性：

>>> t  = """<div class="normal" id="foo">  Many  <a href ='/some link' target = 'blank'>Bollywood</a>  stars today  are avowed foodies  <a href = 'link2'>Ranbir Kapoor</a>  Alia Bhat  </div>"""
>>> bs = BeautifulSoup(t)
>>> print(bs.find('div').text)
  Many  Bollywood  stars today  are avowed foodies  Ranbir Kapoor  Alia Bhat

>>t=“”如今许多明星都是公认的美食家Alia Bhat”“”
>>>bs=美联（t）
>>>打印（bs.find（'div'）。文本）
如今，许多宝莱坞明星都是公认的美食家兰比尔·卡普尔·阿里亚·巴特（Ranbir Kapoor Alia Bhat）

要从BeautifulGroup中的某些元素中提取

文本

，可以使用

.text

属性：

>>> t  = """<div class="normal" id="foo">  Many  <a href ='/some link' target = 'blank'>Bollywood</a>  stars today  are avowed foodies  <a href = 'link2'>Ranbir Kapoor</a>  Alia Bhat  </div>"""
>>> bs = BeautifulSoup(t)
>>> print(bs.find('div').text)
  Many  Bollywood  stars today  are avowed foodies  Ranbir Kapoor  Alia Bhat

>>t=“”如今许多明星都是公认的美食家Alia Bhat”“”
>>>bs=美联（t）
>>>打印（bs.find（'div'）。文本）
如今，许多宝莱坞明星都是公认的美食家兰比尔·卡普尔·阿里亚·巴特（Ranbir Kapoor Alia Bhat）