Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何获取类'img'内标记'src'的内容?_Python_Css_Python 3.x_Beautifulsoup_Css Selectors - Fatal编程技术网

Python 如何获取类'img'内标记'src'的内容?

Python 如何获取类'img'内标记'src'的内容?,python,css,python-3.x,beautifulsoup,css-selectors,Python,Css,Python 3.x,Beautifulsoup,Css Selectors,我想从这些元素中提取标记src的内容 <div class="img-placeholder" style="padding-bottom:57.9%;"> <img data-srcset="abc.png" src="abc.png" data-placeholder="blurry" alt="Kijun Line" class=" lazyl

我想从这些元素中提取标记src的内容

<div class="img-placeholder" style="padding-bottom:57.9%;">
<img data-srcset="abc.png" src="abc.png" data-placeholder="blurry" alt="Kijun Line" class=" lazyloaded" data-click-tracked="true" data-img-lightbox="true" data-owner="" data-caption="TradingView" data-expand="300" id="mntl-sc-block-image_1-0-5" data-tracking-container="true" srcset="abc.png 1541w">
</div>
更新:我改为

[tag.attrs['src'] for tag in soup.select('div.img-placeholder img') if 'src' in tag.attrs]
但结果并非如预期的那样,即

['data:image/gif;charset=utf-8;base64,R0lGODlhCwAGAPIAAHNzdWulg+Xy5f/l5f///3NzdXNzdXNzdSwAAAAACwAGAEIIGwAJCBxIsKDBgwIGBiAI4CABAAMGJAwQ0SHBgAA7',
 'data:image/gif;charset=utf-8;base64,R0lGODlhCgAGAPIAABisoH+O0f2zcoGRz7a/4LfA4ejp7fDw8iwAAAAACgAGAEIIIQANHBhIcKCBAQULGhCYkMCBhQkHFhDAEMCBAAQhRnwYEAA7']
例如,满足条件的元素之一是

更新:我改为selenium。奇怪的是,我只能在两个令人满意的元素中找到一个。我的代码是

import requests
session = requests.Session()
from bs4 import BeautifulSoup
import os, time
from selenium import webdriver
driver = webdriver.Chrome('C:\\Users\\Akira\\Downloads\\Compressed\\chromedriver.exe')
l = 'https://www.investopedia.com/terms/k/kijun-line.asp'
driver.get(l)
time.sleep(10)
text = driver.page_source
soup = BeautifulSoup(text, 'html.parser') 
temp4 = [tag.attrs['src'] for tag in soup.select('div.img-placeholder img') if 'src' in tag.attrs]
temp4 
结果是

['https://www.investopedia.com/thmb/LbG-nFJad_8ednnDsr-fD7Uvcb8=/1541x893/filters:no_upscale():max_bytes(150000):strip_icc():format(webp)/Kijun-Sen-3b696ff097264a429b780a98afeb5cbe.png',
 'data:image/gif;charset=utf-8;base64,R0lGODlhCgAGAPIAABisoH+O0f2zcoGRz7a/4LfA4ejp7fDw8iwAAAAACgAGAEIIIQANHBhIcKCBAQULGhCYkMCBhQkHFhDAEMCBAAQhRnwYEAA7']

您想要的是数据src属性,而不是src

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.investopedia.com/terms/k/kijun-line.asp')
soup = bs(r.content, 'lxml')
print([i['data-src'] for i in soup.select('.img-placeholder img')])

@RishabhKumar我还要求澄清为什么我的代码不起作用。似乎不是所有的img标记都有src属性:[tag.attrs['src']作为汤中的标记。如果tag.attrs中的'src',选择'img',那么如果你真的不想要src=data的图像:那么你可以过滤掉它们。不过,如果收集所有图像标记的全部目的是为了获取保存在计算机上的实际图像文件,您可以使用python对这些图像进行解码。忙碌,但您可以尝试复制浏览器在手动浏览时发送的所有请求头和cookie。我的try with请求返回两个包含数据的图像:。。。。我对selenium的尝试只返回一个包含数据的图像:……您希望有两个URL吗?它们是什么?非常好用。是否仍要捕获属性src中的链接?响应中的链接与第页中的链接不完全相同。您可以从数据srcset中过滤/regex出相同的解析url。我只是好奇是否有一些技术可以提取attributer src的内容。我指的是当我右键点击图像并在谷歌浏览器中查看它时看到的内容?我认为它是通过javascript生成的,所以你可以使用selenium?我认为它是在这个脚本中生成的https://www.investopedia.com/static/1.139.0/cache/eNqNU0FywzAI_FA1ekSvvfUFSMYONZZShOo4ry9xkjbjKGlnPAbWuxgE8kVBKXqmUPyHPZ8VZXnxF9iQqsSkhMWFSqxuJt25De2qZjguhY5YbhNMSfk9Smb-QWOe9jlh0uIpKUoC9h32KCf-6gh2LTb0vVUDio4pjU5wFtKz7MGnVhY8nP9pla7ktfY7sKUcLf_h_G6W162pTr3CDMsTSs94oMDooNlngi8aDMzJ7xA6FGfIjfuHJogxo9QplFW4iZt1iQGMV-smVPBhsZPAi2mpZuoGNKs5s9L-alvUolKjVhusi9mGnnTdmBxHF4g5ZJCu_FNIEwzoo3Wl-EbDTkM-VILKWPVLRJYMC5UJ0YJ9PMPPJF9MRKRHKXBONxWODU7BPR3T68F8YOS3JZFCO5NF7BQCOATY.min.js:使用函数he,t格式化
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.investopedia.com/terms/k/kijun-line.asp')
soup = bs(r.content, 'lxml')
print([i['data-src'] for i in soup.select('.img-placeholder img')])