使用Python提取在特定包含DIV中找到的DIV ID名称_Python_Html_Python 2.7_Xpath_Web Scraping

使用Python提取在特定包含DIV中找到的DIV ID名称

python html python-2.7 xpath web-scraping

使用Python提取在特定包含DIV中找到的DIV ID名称,python,html,python-2.7,xpath,web-scraping,Python,Html,Python 2.7,Xpath,Web Scraping,我一直在使用lxml通过xpath从页面中提取数据。到目前为止，一切顺利。但我有一个新的挑战：我必须提取包含DIV的所有DIV的ID，并将这些ID名称传递到一个列表中。我猜我可以用漂亮的汤来做这件事（或者也可能是lxml），我只是不知道该怎么做：例如，在本例中，我必须提取“灯塔”和“扁豆”： ……其他事情。。。 ……其他事情。。。建议谢谢这很简单： >>> from bs4 import BeautifulSoup >>> soup = Beau

我一直在使用lxml通过xpath从页面中提取数据。到目前为止，一切顺利。但我有一个新的挑战：

我必须提取包含DIV的所有DIV的ID，并将这些ID名称传递到一个列表中。我猜我可以用漂亮的汤来做这件事（或者也可能是lxml），我只是不知道该怎么做：

例如，在本例中，我必须提取“灯塔”和“扁豆”：


……其他事情。。。
……其他事情。。。

建议

谢谢

这很简单：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""
...     <div id="live-events">
... 
...        <div class ="events" id="beacon"> 
...            ....other things...
...        </div>
... 
...        <div class="events" id ="lentil">
...           ....other things...
...        </div>
... 
...     </div>
... """)
>>> live_events = soup.find(id="live-events")
>>> ids = [div["id"] for div in live_events.find_all("div")]
>>> ids
[u'beacon', u'lentil']

>>来自bs4导入组
>>>汤=美汤
...     
... 
...         
……其他事情。。。
...        
... 
...        
……其他事情。。。
...        
... 
...     
... """)
>>>live\u events=soup.find（id=“live events”）
>>>ids=[div[“id”]用于实时事件中的div。查找所有（“div”）]
>>>身份证
[u'beacon'，u'Lentl']

对不起，最后一件事。我如何在请求中使用它，而不是在原始HTML中使用变量？我现在遵循这一点作为指导：这是显而易见的吗？您链接到的页面显示了如何使用

请求

获取文档内容，上面的代码显示了如何将该内容转换为BS对象。我看不出你会有什么问题…哦，确保，明白了。再次感谢。

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""
...     <div id="live-events">
... 
...        <div class ="events" id="beacon"> 
...            ....other things...
...        </div>
... 
...        <div class="events" id ="lentil">
...           ....other things...
...        </div>
... 
...     </div>
... """)
>>> live_events = soup.find(id="live-events")
>>> ids = [div["id"] for div in live_events.find_all("div")]
>>> ids
[u'beacon', u'lentil']