如何使用Python拍摄网站的屏幕截图/图像？_Python_Screenshot_Webpage_Backend

如何使用Python拍摄网站的屏幕截图/图像？

python

如何使用Python拍摄网站的屏幕截图/图像？,python,screenshot,webpage,backend,Python,Screenshot,Webpage,Backend,我想要实现的是从python中的任何网站获取网站截图 Env:Linux您没有提到您正在运行的环境，这是一个很大的区别，因为没有能够呈现HTML的纯Python web浏览器但如果你用的是Mac电脑，我已经用得很成功了。如果没有，正如其他人指出的那样，有很多选择。在Mac上，有和在Linux+KDE上，您可以使用。我试过前者，效果很好，听说后者已经投入使用我最近遇到了一个声称是跨平台的（我想是Qt把WebKit放到了他们的库中）。但是我从来没有试过，所以我不能告诉你更多 QtWebKit链接

我想要实现的是从python中的任何网站获取网站截图

Env:Linux

您没有提到您正在运行的环境，这是一个很大的区别，因为没有能够呈现HTML的纯Python web浏览器

但如果你用的是Mac电脑，我已经用得很成功了。如果没有，正如其他人指出的那样，有很多选择。

在Mac上，有和在Linux+KDE上，您可以使用。我试过前者，效果很好，听说后者已经投入使用

我最近遇到了一个声称是跨平台的（我想是Qt把WebKit放到了他们的库中）。但是我从来没有试过，所以我不能告诉你更多

QtWebKit链接显示了如何从Python访问。您至少应该能够使用subprocess对其他进程执行相同的操作。

我不能对ars的答案发表评论，但实际上我使用QtWebkit运行，它运行得非常好

我只是想确认罗兰在他的博客上发布的内容在Ubuntu上非常有效。我们的生产版本最终没有使用他编写的任何内容，但我们使用PyQt/QtWebKit绑定取得了很大成功

注意：URL过去是：我已使用工作副本对其进行了更新。

以下是使用webkit的简单解决方案：

这里是我的解决方案，我从各种渠道寻求帮助。它会捕获整个网页的屏幕截图，并对其进行裁剪（可选），并根据裁剪后的图像生成缩略图。以下是要求：

要求：

安装NodeJS

使用节点的包管理器安装phantomjs:

npm-g安装phantomjs

安装selenium（在您的virtualenv中，如果您正在使用它）

安装imageMagick

将phantomjs添加到系统路径（在windows上）

以下是生成的图像：

#!/usr/bin/env python

import gtk.gdk

import time

import random

while 1 :
    # generate a random time between 120 and 300 sec
    random_time = random.randrange(120,300)

    # wait between 120 and 300 seconds (or between 2 and 5 minutes)
    print "Next picture in: %.2f minutes" % (float(random_time) / 60)

    time.sleep(random_time)

    w = gtk.gdk.get_default_root_window()
    sz = w.get_size()

    print "The size of the window is %d x %d" % sz

    pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1])
    pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1])

    ts = time.time()
    filename = "screenshot"
    filename += str(ts)
    filename += ".png"

    if (pb != None):
        pb.save(filename,"png")
        print "Screenshot saved to "+filename
    else:
        print "Unable to get the screenshot."

from selenium import webdriver

DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()

：通过

请求访问此路由。如果您对DOM感兴趣，请获取
：如果您对屏幕截图感兴趣，请访问此路线


您将使用npm安装rendertron，在一个终端中运行rendertron
，访问http://localhost:3000/screenshot/:url
并保存该文件，但可以通过演示在本地运行此Python3代码段，而无需安装npm软件包：
import requests

BASE = 'https://render-tron.appspot.com/screenshot/'
url = 'https://google.com'
path = 'target.jpg'
response = requests.get(BASE + url, stream=True)
# save file, see https://stackoverflow.com/a/13137873/7665691
if response.status_code == 200:
    with open(path, 'wb') as file:
        for chunk in response:
            file.write(chunk)

您可以使用谷歌页面速度API轻松完成任务。在我当前的项目中，我使用了用Python编写的Google Page Speed API查询来捕获提供的任何Web URL的屏幕截图，并将其保存到某个位置。看一看
import urllib2
import json
import base64
import sys
import requests
import os
import errno

#   The website's URL as an Input
site = sys.argv[1]
imagePath = sys.argv[2]

#   The Google API.  Remove "&strategy=mobile" for a desktop screenshot
api = "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + urllib2.quote(site)

#   Get the results from Google
try:
    site_data = json.load(urllib2.urlopen(api))
except urllib2.URLError:
    print "Unable to retreive data"
    sys.exit()

try:
    screenshot_encoded =  site_data['screenshot']['data']
except ValueError:
    print "Invalid JSON encountered."
    sys.exit()

#   Google has a weird way of encoding the Base64 data
screenshot_encoded = screenshot_encoded.replace("_", "/")
screenshot_encoded = screenshot_encoded.replace("-", "+")

#   Decode the Base64 data
screenshot_decoded = base64.b64decode(screenshot_encoded)

if not os.path.exists(os.path.dirname(impagepath)):
    try:
        os.makedirs(os.path.dirname(impagepath))
        except  OSError as exc:
            if exc.errno  != errno.EEXIST:
                raise

#   Save the file
with open(imagePath, 'w') as file_:
    file_.write(screenshot_decoded)

不幸的是，以下是缺点。如果这些都不重要，您可以继续使用谷歌页面速度API。它工作得很好

最大宽度为320px
根据谷歌API配额，每天有25000个请求的限制
使用web服务s-shot.ru（因此速度不太快），但通过链接配置设置所需内容非常容易。
您还可以轻松捕获整页屏幕截图
import requests
import urllib.parse

BASE = 'https://mini.s-shot.ru/1024x0/JPEG/1024/Z100/?' # you can modify size, format, zoom
url = 'https://stackoverflow.com/'#or whatever link you need
url = urllib.parse.quote_plus(url) #service needs link to be joined in encoded format
print(url)

path = 'target1.jpg'
response = requests.get(BASE + url, stream=True)

if response.status_code == 200:
    with open(path, 'wb') as file:
        for chunk in response:
            file.write(chunk)

我创建了一个名为pywebcapture的库，该库包装了selenium，可以实现以下功能：
pip install pywebcapture

使用pip安装后，您可以执行以下操作以轻松获得全尺寸屏幕截图：
# import modules
from pywebcapture import loader, driver

# load csv with urls
csv_file = loader.CSVLoader("csv_file_with_urls.csv", has_header_bool, url_column, optional_filename_column)
uri_dict = csv_file.get_uri_dict()

# create instance of the driver and run
d = driver.Driver("path/to/webdriver/", output_filepath, delay, uri_dict)
d.run()

享受吧
这是一个老问题，大多数答案都有点过时。
目前，我会做两件事中的一件
1。创建一个截图程序
我会用截图的网站。这在包上运行。木偶演员旋转了一个无头铬浏览器，所以屏幕截图看起来和普通浏览器一模一样
这来自Pypetteer文档：
导入异步IO
从pyppeteer导入启动
异步def main（）：
browser=等待启动（）
page=wait browser.newPage（）
等待页面。转到（'https://example.com')
等待page.screenshot（{'path'：'example.png'}）
等待浏览器关闭（）
asyncio.get_event_loop（）。运行_直到完成（main（））

2。使用屏幕截图API
您还可以使用屏幕截图API，例如。
很好的一点是，您不必自己设置所有内容，只需调用API端点即可
这摘自截图API的文档：
import urllib.parse
导入urllib.request
导入ssl
ssl.\u创建\u默认\u https\u上下文=ssl.\u创建\u未验证\u上下文
#参数。
token=“您的\u API\u令牌”
url=urllib.parse.quote_plus（“https://example.com")
宽度=1920
高度=1080
output=“image”
#创建查询URL。
查询=”https://screenshotapi.net/api/v1/screenshot"
查询+=“？标记=%s&url=%s&width=%d&height=%d&output=%s”%（标记、url、宽度、高度、输出）
#调用API。
urllib.request.urlretrieve（查询“./example.png”）
11年后……

使用Python3.6
和Google PageSpeedApi Insights v5
拍摄网站截图：
import base64
import requests
import traceback
import urllib.parse as ul

# It's possible to make requests without the api key, but the number of requests is very limited  

url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"

key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"

try:
    j = requests.get(u).json()
    ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
    ss_decoded = base64.b64decode(ss_encoded)
    with open(image_path, 'wb+') as f:
        f.write(ss_decoded) 
except :
    print(traceback.format_exc())
    exit(1)


注意事项：


优点：免费
Conns：低分辨率


限制：

每天查询量=25000
每100秒查询次数=400


快速搜索该网站会发现很多与此类似的内容。这是一个好的开始：Shog9：谢谢！！你的链接有一些。。。我会查的。Shog9：你为什么不加上它作为答案呢？所以它可以给你分数。@Esteban：这不是我的工作-其他人花时间去挖掘这个并找到资源；我只是在发布链接。：-）根据这里的解释，我建议现在就转向phantomjs，因为它提供了一个非常干净和健壮的解决方案：酷。我想这就是我下次需要这样的库时将尝试的库。我们最终在它上面安装了一个RabbitMQ服务器，并构建了一些代码来控制它
pip install pywebcapture

# import modules
from pywebcapture import loader, driver

# load csv with urls
csv_file = loader.CSVLoader("csv_file_with_urls.csv", has_header_bool, url_column, optional_filename_column)
uri_dict = csv_file.get_uri_dict()

# create instance of the driver and run
d = driver.Driver("path/to/webdriver/", output_filepath, delay, uri_dict)
d.run()

import base64
import requests
import traceback
import urllib.parse as ul

# It's possible to make requests without the api key, but the number of requests is very limited  

url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"

key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"

try:
    j = requests.get(u).json()
    ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
    ss_decoded = base64.b64decode(ss_encoded)
    with open(image_path, 'wb+') as f:
        f.write(ss_decoded) 
except :
    print(traceback.format_exc())
    exit(1)