Python 3.x 使用selenium/urllib从web下载python 3中的图像文件_Python 3.x_Xml_Selenium_Urllib

Python 3.x 使用selenium/urllib从web下载python 3中的图像文件

python-3.x xml selenium

Python 3.x 使用selenium/urllib从web下载python 3中的图像文件,python-3.x,xml,selenium,urllib,Python 3.x,Xml,Selenium,Urllib,正在尝试下载验证码图像获取以下错误：urllib.error.HTTPError:HTTP错误500:内部服务器错误见下面的代码： from selenium import webdriver import urllib.request driver = webdriver.Chrome() driver.get('https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/index') img = driver.find_el

正在尝试下载验证码图像

获取以下错误：urllib.error.HTTPError:HTTP错误500:内部服务器错误

见下面的代码：

from selenium import webdriver
import urllib.request

driver = webdriver.Chrome()

driver.get('https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/index')

img = driver.find_element_by_xpath('//*[@id="type_recherche"]/div[5]/div/img')
src = img.get_attribute('src')

urllib.request.urlretrieve(src, "captcha.png")

当我打印src时，我得到以下信息：

DevTools listening on ws://127.0.0.1:65317/devtools/browser/36eb75bc-f03c-41ee-96cc-138df591c665
https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/createimage.png?timestamp=1583199024767

下面是您可以用来保存captcha.jpg的示例脚本

import requests
import shutil
url = "https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/createimage.png?timestamp=1583203496087"
# we are able to use the same cookie even after refreshing (so you should be good to use the same cookie)
headers = {
  'Cookie': 'JSESSIONID=YB-eSDCWKU-SG_bKEtluH8kzvWMop4B0plLN4NOLXtO09plZSEuS!-209918963'
}

response = requests.get(url,headers=headers, stream=True)
with open("captcha.jpg", 'wb') as f:
    response.raw.decode_content = True
    shutil.copyfileobj(response.raw, f)

下面是完整的代码

from selenium import webdriver
import requests
import shutil

driver = webdriver.Chrome()

driver.get('https://servicesenligne2.ville.montreal.qc.ca/sel/evalweb/index')

img = driver.find_element_by_xpath('//*[@id="type_recherche"]/div[5]/div/img')
src = img.get_attribute('src')
jsession = driver.get_cookie('JSESSIONID')['value']
headers = {
  'Cookie': 'JSESSIONID='+jsession
}

response = requests.get(src,headers=headers, stream=True)
with open("captcha.jpg", 'wb') as f:
    response.raw.decode_content = True
    shutil.copyfileobj(response.raw, f)

您能试着打印出

src

的值吗？这是不是可以传递给

urllib.request.urlretrieve

！此代码允许我下载验证码。唯一的问题是，当我打开文件时，它会显示“似乎我们不支持此文件格式”，文件大小是多少？如果是0 KB，请检查为您生成的cookie。请尝试使用更新的代码。我正在动态获取

JSESSIONID

，这样您就不必再担心它了。