Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python2.7和beautifulsoup在html页面中的标记位置_Python_Python 2.7_Beautifulsoup - Fatal编程技术网

使用Python2.7和beautifulsoup在html页面中的标记位置

使用Python2.7和beautifulsoup在html页面中的标记位置,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我正在尝试解析具有给定格式的html页面: <img class="outer" id="first" /> <div class="content" .../> <div class="content" .../> <div class="content" /> <img class="outer" id="second" /> <div class="content" .../> <div class="conte

我正在尝试解析具有给定格式的html页面:

<img class="outer" id="first" />
<div class="content" .../>
<div class="content" .../>
<div class="content" />
<img class="outer" id="second" />
<div class="content" .../>
<div class="content" .../>
<img class="outer" id="third" />
<div class="content" .../>
<div class="content" .../>
使用:

>>对于div_块中的divtag:
...     打印divtag.find_previous_sibling('img'))
... 

不是从当前开始-您需要迭代所有标记,或至少两种类型的标记,如果标记类型为img,则存储id如果类为div,则当前存储的id会告诉您所在的容器。注意:您可以在BS中使用re来过滤这两种类型


目前,您正在通过只提取标记来删除上下文

第200000个python问题!耶!200000! 祝贺哎呀,我刚刚重新标记了一堆…-)@马蒂扬皮特斯让你完美无缺。正是我想要的:)。谢谢。@Ranjan不客气!别忘了接受答案!再次提出第20万条python标记的问题:D
img_blocks = soup.find_all('img', attrs={'class':'outer'})
div_Blocks = soup.find_all('div', attrs={'class':'content'})
>>> for divtag in div_Blocks:
...     print divtag.find_previous_sibling('img')
... 
<img class="outer" id="first"/>
<img class="outer" id="first"/>
<img class="outer" id="first"/>
<img class="outer" id="second"/>
<img class="outer" id="second"/>
<img class="outer" id="third"/>
<img class="outer" id="third"/>