Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何获得img herf标记上方的描述文本_Python_Url_Beautifulsoup_Web Crawler - Fatal编程技术网

Python 如何获得img herf标记上方的描述文本

Python 如何获得img herf标记上方的描述文本,python,url,beautifulsoup,web-crawler,Python,Url,Beautifulsoup,Web Crawler,我想获得图片的描述(或标题),我想批量处理html,而不是通过gooolg检查工具逐个查找xpath来获取文本,因为所有标题或描述都没有通用规则(有些图片没有描述或标题),唯一的办法似乎是找到图片的位置,找到图片周围最近的文字,它很可能是我的目标 data=<p style="margin-top:6pt;margin-bottom:0pt;text-indent:4.54%;font-family:Times New Roman;font-size:10pt;font-weight:no

我想获得图片的描述(或标题),我想批量处理html,而不是通过gooolg检查工具逐个查找xpath来获取文本,因为所有标题或描述都没有通用规则(有些图片没有描述或标题),唯一的办法似乎是找到图片的位置,找到图片周围最近的文字,它很可能是我的目标

data=<p style="margin-top:6pt;margin-bottom:0pt;text-indent:4.54%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">
   The following graph sets forth the cumulative total return to CECO’s shareholders during the five years ended December&nbsp;31, 2018, as well as the following indices: Russell 2000 Index, Standard and Poor’s (“S&amp;P”) 600 Small Cap Industrial Machinery Index, and S&amp;P 500 Index. Assumes $100 was invested on December&nbsp;31, 2013, including the reinvestment of dividends, in each category.
</p>
<p style="margin-top:6pt;margin-bottom:0pt;text-indent:4.54%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">
  <img src="gfsqvgqkrgf1000002.jpg" title="" alt="" style="width:649px;height:254px;">
</p>

但是这不是我想要的您可以通过首先搜索包含

'' 从bs4导入BeautifulSoup soup=BeautifulSoup(数据'lxml') 打印(soup.select_one('p:has(img)).find_previous('p').text.strip())
印刷品:

下图列出了CECO的累计总回报 截至2018年12月31日的五年内的股东,以及 以下指数:罗素2000指数、标准普尔(“标准普尔”) 600小型工业机械指数和标准普尔500指数。假设 2013年12月31日投资了100美元,包括 每种类别的股息


请添加您尝试过的代码,以及作为输出与预期的对比,您得到了什么?我说,我想找到标记的前辈节点,但输出总是它的父节点好主意,但它不起作用!NotImplementedError:仅实现以下伪类:类型的第n个@李文举, 您需要升级到Beautiful Soup 4.7+或安装
SoupSive
,然后直接导入并使用它的API
SoupSive。选择一个('p:has(img'),Soup)。查找上一个('p')。text.strip())
@李文举 另一种方法(不升级)是使用
soup。选择一个(“p>img”)。查找上一个(“p”)。查找上一个(“p”)。文本
。有点傻,但它是有效的@李文举 我使用的是版本
beautifulsoup4==4.7.1
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'lxml')
print(soup.select_one("p > img").find_previous('p'))
data = '''<p style="margin-top:6pt;margin-bottom:0pt;text-indent:4.54%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">
   The following graph sets forth the cumulative total return to CECO’s shareholders during the five years ended December&nbsp;31, 2018, as well as the following indices: Russell 2000 Index, Standard and Poor’s (“S&amp;P”) 600 Small Cap Industrial Machinery Index, and S&amp;P 500 Index. Assumes $100 was invested on December&nbsp;31, 2013, including the reinvestment of dividends, in each category.
</p>
<p style="margin-top:6pt;margin-bottom:0pt;text-indent:4.54%;font-family:Times New Roman;font-size:10pt;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;">
  <img src="gfsqvgqkrgf1000002.jpg" title="" alt="" style="width:649px;height:254px;">
</p>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

print(soup.select_one('p:has(img)').find_previous('p').text.strip())