python beautifulsoup iframe文本提取

python beautifulsoup iframe文本提取,python,beautifulsoup,Python,Beautifulsoup,我是Beautifulsoup的新手,我试图从这个站点提取一些原始数据,我做了解析 from urllib.request import urlopen from bs4 import BeautifulSoup path='https://www.esquire.com/entertainment/tv/g28380481/best-anime-2019/' f = urlopen(path) html = str(f.read()) soup = BeautifulSoup(html, 'h

我是Beautifulsoup的新手,我试图从这个站点提取一些原始数据,我做了解析

from urllib.request import urlopen
from bs4 import BeautifulSoup
path='https://www.esquire.com/entertainment/tv/g28380481/best-anime-2019/'
f = urlopen(path)
html = str(f.read())
soup = BeautifulSoup(html, 'html.parser')
txt = soup.find_all('iframe')
我得到了这个bs4物体

[<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/6M7f41OJfcM?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/0glqBjvku84?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/YKJf876thxw?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/SdFgPGSmy0Y?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/Ie-bo3IulmY?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/ApLudqucq-s?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/FpRk3m3Y-Zg?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/J9tu253SOas?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/lCPf9SA4mgU?enablejsapi=1" frameborder="0"></iframe>,
 <iframe allowfullscreen="true" data-src="//www.youtube.com/embed/neqxQdpTyXE?enablejsapi=1" frameborder="0"></iframe>]
如何使用此处的属性提取名称

<span class="listicle-slide-hed-text">Fruits Basket (Funimation)</span>,
<span class="listicle-slide-hed-text">One Punch Man (Hulu)</span>,
<span class="listicle-slide-hed-text">Rilakkuma and Kaoru (Netflix)</span>,
<span class="listicle-slide-hed-text">Mob Psycho 100 II (Crunchyroll)</span>,
<span class="listicle-slide-hed-text">Ride Your Wave (July release at Fantasia Fest)</span>,
<span class="listicle-slide-hed-text">The Promised Neverland (Hulu)</span>,
<span class="listicle-slide-hed-text">Vinland Saga (Amazon Prime)</span>,
<span class="listicle-slide-hed-text">Boogiepop Never Laughs (Crunchyroll)</span>,
<span class="listicle-slide-hed-text">Saga of Tanya the Evil (Crunchyroll)</span>,
<span class="listicle-slide-hed-text">Dororo (Amazon Prime)</span>
果篮(功能),
一个拳击手(Hulu),
Rilakkuma和Kaoru(Netflix),
暴民心理100 II(Crunchyroll),
驾驭你的浪潮(幻想曲节7月上映),
承诺的梦幻岛(Hulu),
文兰传奇(亚马逊Prime),
布吉波普从不笑(嘎吱作响),
邪恶的坦尼娅传奇(克朗奇罗尔),
多罗罗(亚马逊Prime)

这里不需要使用正则表达式

更简单的方法是使用beautifulsoup元素的
attrs
属性,如:

从urllib.request导入urlopen
从bs4导入BeautifulSoup
路径https://www.esquire.com/entertainment/tv/g28380481/best-anime-2019/'
f=urlopen(路径)
html=str(f.read())
soup=BeautifulSoup(html,'html.parser')
txt=soup.find_all('iframe'))
对于txt中的元素:
打印(element.attrs[“data src”][2:]
这会产生相同的结果:

www.youtube.com/embed/6M7f41OJfcM?enablejsapi=1
www.youtube.com/embed/0glqBjvku84?enablejsapi=1
www.youtube.com/embed/YKJf876thxw?enablejsapi=1
www.youtube.com/embed/SdFgPGSmy0Y?enablejsapi=1
www.youtube.com/embed/Ie-bo3IulmY?enablejsapi=1
www.youtube.com/embed/ApLudqucq-s?enablejsapi=1
www.youtube.com/embed/FpRk3m3Y-Zg?enablejsapi=1
www.youtube.com/embed/J9tu253SOas?enablejsapi=1
www.youtube.com/embed/lCPf9SA4mgU?enablejsapi=1
www.youtube.com/embed/neqxQdpTyXE?enablejsapi=1

您可以在此处阅读有关如何处理属性的更多信息:

请提供您尝试过的内容以及您预期的输出结果?亲爱的Sabri-kunduK是对的:我们需要了解您迄今为止所做的工作-努力和尝试-解决任务的方法以及哪些代码-甚至是片段。您应该向我们提供的不仅仅是希望与bs4合作。我们期待再次收到您的来信。您可以像字典一样访问元素的属性,例如,汤中的i.find_all('iframe'):print(i['data-src'])在这里使用什么代码水果篮(Funimation)提取“果篮(功能)”的步骤
www.youtube.com/embed/6M7f41OJfcM?enablejsapi=1
www.youtube.com/embed/0glqBjvku84?enablejsapi=1
www.youtube.com/embed/YKJf876thxw?enablejsapi=1
www.youtube.com/embed/SdFgPGSmy0Y?enablejsapi=1
www.youtube.com/embed/Ie-bo3IulmY?enablejsapi=1
www.youtube.com/embed/ApLudqucq-s?enablejsapi=1
www.youtube.com/embed/FpRk3m3Y-Zg?enablejsapi=1
www.youtube.com/embed/J9tu253SOas?enablejsapi=1
www.youtube.com/embed/lCPf9SA4mgU?enablejsapi=1
www.youtube.com/embed/neqxQdpTyXE?enablejsapi=1