Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/django/22.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中使用请求或mechanize加载所有第三方脚本_Python_Django_Screen Scraping - Fatal编程技术网

在Python中使用请求或mechanize加载所有第三方脚本

在Python中使用请求或mechanize加载所有第三方脚本,python,django,screen-scraping,Python,Django,Screen Scraping,我正在将网页加载到iframe中,我希望确保所有相关媒体都可用。我目前正在使用请求下载页面,然后进行一些查找/替换,但这并没有完全覆盖。python有没有办法获取页面加载到浏览器时发出的所有脚本、css和图像请求的列表?BeautifulSoup 用于获取所有、和标记,然后提取相应的属性 示例输出: from bs4 import BeautifulSoup import requests resp = requests.get("http://www.yahoo.com") soup =

我正在将网页加载到iframe中,我希望确保所有相关媒体都可用。我目前正在使用请求下载页面,然后进行一些查找/替换,但这并没有完全覆盖。python有没有办法获取页面加载到浏览器时发出的所有脚本、css和图像请求的列表?

BeautifulSoup 用于获取所有
标记,然后提取相应的属性

示例输出:

from bs4 import BeautifulSoup
import requests

resp = requests.get("http://www.yahoo.com")

soup = BeautifulSoup(resp.text)

# Pull the linked images (note: will grab base64 encoded images) 
images = [img['src'] for img in soup.findAll('img') if img.has_key('src')]

# Checking for src ensures that we don't grab the embedded scripts
scripts = [script['src'] for script in soup.findAll('script') if script.has_key('src')]

# favicon.ico and css
links = [link['href'] for link in soup.findAll('link') if link.has_key('href')]
In [30]: images = [img['src'] for img in soup.findAll('img') if img.has_key('src')]

In [31]: images[:5]
Out[31]:
['http://l.yimg.com/dh/ap/default/130925/My_Yahoo_Defatul_HP_ad_300x250.jpeg',
 'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png',
 'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png',
 'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png',
 'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png']