如何使用Python拍摄网站的屏幕截图/图像?
我想要实现的是从python中的任何网站获取网站截图如何使用Python拍摄网站的屏幕截图/图像?,python,screenshot,webpage,backend,Python,Screenshot,Webpage,Backend,我想要实现的是从python中的任何网站获取网站截图 Env:Linux您没有提到您正在运行的环境,这是一个很大的区别,因为没有能够呈现HTML的纯Python web浏览器 但如果你用的是Mac电脑,我已经用得很成功了。如果没有,正如其他人指出的那样,有很多选择。在Mac上,有和在Linux+KDE上,您可以使用。我试过前者,效果很好,听说后者已经投入使用 我最近遇到了一个声称是跨平台的(我想是Qt把WebKit放到了他们的库中)。但是我从来没有试过,所以我不能告诉你更多 QtWebKit链接
Env:Linux您没有提到您正在运行的环境,这是一个很大的区别,因为没有能够呈现HTML的纯Python web浏览器 但如果你用的是Mac电脑,我已经用得很成功了。如果没有,正如其他人指出的那样,有很多选择。在Mac上,有和在Linux+KDE上,您可以使用。我试过前者,效果很好,听说后者已经投入使用 我最近遇到了一个声称是跨平台的(我想是Qt把WebKit放到了他们的库中)。但是我从来没有试过,所以我不能告诉你更多
QtWebKit链接显示了如何从Python访问。您至少应该能够使用subprocess对其他进程执行相同的操作。我不能对ars的答案发表评论,但实际上我使用QtWebkit运行,它运行得非常好 我只是想确认罗兰在他的博客上发布的内容在Ubuntu上非常有效。我们的生产版本最终没有使用他编写的任何内容,但我们使用PyQt/QtWebKit绑定取得了很大成功
注意:URL过去是:我已使用工作副本对其进行了更新。以下是使用webkit的简单解决方案:
这里是我的解决方案,我从各种渠道寻求帮助。它会捕获整个网页的屏幕截图,并对其进行裁剪(可选),并根据裁剪后的图像生成缩略图。以下是要求: 要求:
npm-g安装phantomjs
以下是生成的图像:
- 试试这个
#!/usr/bin/env python
import gtk.gdk
import time
import random
while 1 :
# generate a random time between 120 and 300 sec
random_time = random.randrange(120,300)
# wait between 120 and 300 seconds (or between 2 and 5 minutes)
print "Next picture in: %.2f minutes" % (float(random_time) / 60)
time.sleep(random_time)
w = gtk.gdk.get_default_root_window()
sz = w.get_size()
print "The size of the window is %d x %d" % sz
pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1])
pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1])
ts = time.time()
filename = "screenshot"
filename += str(ts)
filename += ".png"
if (pb != None):
pb.save(filename,"png")
print "Screenshot saved to "+filename
else:
print "Unable to get the screenshot."
你能用硒吗
from selenium import webdriver
DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()
使用是一种选择。在引擎盖下,这是一个无头铬合金,露出以下端点:
- :通过
请求访问此路由。如果您对DOM感兴趣,请获取
李>
- :如果您对屏幕截图感兴趣,请访问此路线
rendertron
,访问http://localhost:3000/screenshot/:url
并保存该文件,但可以通过演示在本地运行此Python3代码段,而无需安装npm软件包:
import requests
BASE = 'https://render-tron.appspot.com/screenshot/'
url = 'https://google.com'
path = 'target.jpg'
response = requests.get(BASE + url, stream=True)
# save file, see https://stackoverflow.com/a/13137873/7665691
if response.status_code == 200:
with open(path, 'wb') as file:
for chunk in response:
file.write(chunk)
您可以使用谷歌页面速度API轻松完成任务。在我当前的项目中,我使用了用Python编写的Google Page Speed API查询来捕获提供的任何Web URL的屏幕截图,并将其保存到某个位置。看一看
import urllib2
import json
import base64
import sys
import requests
import os
import errno
# The website's URL as an Input
site = sys.argv[1]
imagePath = sys.argv[2]
# The Google API. Remove "&strategy=mobile" for a desktop screenshot
api = "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + urllib2.quote(site)
# Get the results from Google
try:
site_data = json.load(urllib2.urlopen(api))
except urllib2.URLError:
print "Unable to retreive data"
sys.exit()
try:
screenshot_encoded = site_data['screenshot']['data']
except ValueError:
print "Invalid JSON encountered."
sys.exit()
# Google has a weird way of encoding the Base64 data
screenshot_encoded = screenshot_encoded.replace("_", "/")
screenshot_encoded = screenshot_encoded.replace("-", "+")
# Decode the Base64 data
screenshot_decoded = base64.b64decode(screenshot_encoded)
if not os.path.exists(os.path.dirname(impagepath)):
try:
os.makedirs(os.path.dirname(impagepath))
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
# Save the file
with open(imagePath, 'w') as file_:
file_.write(screenshot_decoded)
不幸的是,以下是缺点。如果这些都不重要,您可以继续使用谷歌页面速度API。它工作得很好
- 最大宽度为320px
- 根据谷歌API配额,每天有25000个请求的限制
import requests
import urllib.parse
BASE = 'https://mini.s-shot.ru/1024x0/JPEG/1024/Z100/?' # you can modify size, format, zoom
url = 'https://stackoverflow.com/'#or whatever link you need
url = urllib.parse.quote_plus(url) #service needs link to be joined in encoded format
print(url)
path = 'target1.jpg'
response = requests.get(BASE + url, stream=True)
if response.status_code == 200:
with open(path, 'wb') as file:
for chunk in response:
file.write(chunk)
我创建了一个名为pywebcapture的库,该库包装了selenium,可以实现以下功能:
pip install pywebcapture
使用pip安装后,您可以执行以下操作以轻松获得全尺寸屏幕截图:
# import modules
from pywebcapture import loader, driver
# load csv with urls
csv_file = loader.CSVLoader("csv_file_with_urls.csv", has_header_bool, url_column, optional_filename_column)
uri_dict = csv_file.get_uri_dict()
# create instance of the driver and run
d = driver.Driver("path/to/webdriver/", output_filepath, delay, uri_dict)
d.run()
享受吧
这是一个老问题,大多数答案都有点过时。 目前,我会做两件事中的一件 1。创建一个截图程序 我会用截图的网站。这在包上运行。木偶演员旋转了一个无头铬浏览器,所以屏幕截图看起来和普通浏览器一模一样 这来自Pypetteer文档:
导入异步IO
从pyppeteer导入启动
异步def main():
browser=等待启动()
page=wait browser.newPage()
等待页面。转到('https://example.com')
等待page.screenshot({'path':'example.png'})
等待浏览器关闭()
asyncio.get_event_loop()。运行_直到完成(main())
2。使用屏幕截图API
您还可以使用屏幕截图API,例如。
很好的一点是,您不必自己设置所有内容,只需调用API端点即可
这摘自截图API的文档:
import urllib.parse
导入urllib.request
导入ssl
ssl.\u创建\u默认\u https\u上下文=ssl.\u创建\u未验证\u上下文
#参数。
token=“您的\u API\u令牌”
url=urllib.parse.quote_plus(“https://example.com")
宽度=1920
高度=1080
output=“image”
#创建查询URL。
查询=”https://screenshotapi.net/api/v1/screenshot"
查询+=“?标记=%s&url=%s&width=%d&height=%d&output=%s”%(标记、url、宽度、高度、输出)
#调用API。
urllib.request.urlretrieve(查询“./example.png”)
11年后……使用
Python3.6
和Google PageSpeedApi Insights v5
拍摄网站截图:
import base64
import requests
import traceback
import urllib.parse as ul
# It's possible to make requests without the api key, but the number of requests is very limited
url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"
key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"
try:
j = requests.get(u).json()
ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
ss_decoded = base64.b64decode(ss_encoded)
with open(image_path, 'wb+') as f:
f.write(ss_decoded)
except :
print(traceback.format_exc())
exit(1)
注意事项:
- 优点:免费
- Conns:低分辨率
- 限制:
- 每天查询量=25000
- 每100秒查询次数=400
pip install pywebcapture
# import modules
from pywebcapture import loader, driver
# load csv with urls
csv_file = loader.CSVLoader("csv_file_with_urls.csv", has_header_bool, url_column, optional_filename_column)
uri_dict = csv_file.get_uri_dict()
# create instance of the driver and run
d = driver.Driver("path/to/webdriver/", output_filepath, delay, uri_dict)
d.run()
import base64
import requests
import traceback
import urllib.parse as ul
# It's possible to make requests without the api key, but the number of requests is very limited
url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"
key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"
try:
j = requests.get(u).json()
ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
ss_decoded = base64.b64decode(ss_encoded)
with open(image_path, 'wb+') as f:
f.write(ss_decoded)
except :
print(traceback.format_exc())
exit(1)