Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/287.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用漂亮的汤刮取数据_Python_Python 2.7_Beautifulsoup - Fatal编程技术网

Python 使用漂亮的汤刮取数据

Python 使用漂亮的汤刮取数据,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,使用beautifulSoap刮取数据时 在这个html代码中有两个标记,但我想从第二个标记中提取数据。那我该怎么做呢? 以此类推,如果有多个相同的标记,我想从任何一个标记中提取数据,我该怎么做 代码: <h2>Video Instructions For Making Soft Idlis</h2> <div class="embed-responsive embed-responsive-16by9"> <iframe class="embed-re

使用beautifulSoap刮取数据时 在这个html代码中有两个标记,但我想从第二个标记中提取数据。那我该怎么做呢? 以此类推,如果有多个相同的标记,我想从任何一个标记中提取数据,我该怎么做

代码:

<h2>Video Instructions For Making Soft Idlis</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
</div>

<h2>Recipe For Making Soft Idlis</h2>
我曾想过用关键字而不是标签来提取数据。 例如,我可以使用tag和keyword Recipe查找第二个tag的数据

如果您知道基于顺序需要什么h2,只需将其用作返回的索引。findAll方法:

例如,我可以使用tag和关键字Recipe来查找第二个tag的数据

是的,你完全可以做到。可以使用Python正则表达式模块匹配标记内的部分文本

从:

如果传入正则表达式对象,Beauty Soup将使用其搜索方法对该正则表达式进行过滤

演示:


如果我不知道任何标签的顺序,那么。?如何提取数据您需要了解有关要提取文本的标记的特定信息,但是您无法告诉脚本您需要的确切标记。它可以是一个类名,文本的一部分,或者,正如你的问题所说的,顺序。
from bs4 import BeautifulSoup
soup = BeautifulSoup('''<h2>Video Instructions For Making Soft Idlis</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
</div>

<h2>Recipe For Making Soft Idlis</h2>''', "html.parser")
>>> soup.findAll("h2")[1]
<h2>Recipe For Making Soft Idlis</h2>
>>> import re
>>> from bs4 import BeautifulSoup
>>> 
>>> html = '''<h2>Video Instructions For Making Soft Idlis</h2>
    <div class="embed-responsive embed-responsive-16by9">
    <iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
    </div>

    <h2>Recipe For Making Soft Idlis</h2>'''
>>>
>>> soup = BeautifulSoup(html, 'html.parser')
>>> tag = soup.find('h2', text=re.compile('Recipe'))
>>> tag
<h2>Recipe For Making Soft Idlis</h2>
>>> tag.text
'Recipe For Making Soft Idlis'