Python Selenium-如何在中自动下载http://www.diva-gis.org/datadown_Python_Selenium_Web_Web Scraping

Python Selenium-如何在中自动下载http://www.diva-gis.org/datadown

python selenium web web-scraping

Python Selenium-如何在中自动下载http://www.diva-gis.org/datadown,python,selenium,web,web-scraping,Python,Selenium,Web,Web Scraping,我正在使用selenium自动从网站下载： http://www.diva-gis.org/datadown 我可以在第一页中单击“确定”，但在第二页中无法单击“下载”。错误消息为“无此类元素” 下面是我的代码： from selenium import webdriver import os driver=webdriver.Chrome(os.path.expanduser('./chromedriver')) driver.get('http://www.diva-gis.org/gd

我正在使用selenium自动从网站下载：

http://www.diva-gis.org/datadown

我可以在第一页中单击“确定”，但在第二页中无法单击“下载”。错误消息为“无此类元素”

下面是我的代码：

from selenium import webdriver
import os

driver=webdriver.Chrome(os.path.expanduser('./chromedriver'))
driver.get('http://www.diva-gis.org/gdata')

driver.find_element_by_xpath('//*[@id="node-36"]/div/div/div/div/form/p[1]/select/option[190]').click()
driver.find_element_by_xpath('//*[@id="node-36"]/div/div/div/div/form/p[3]/input').click()


# this is the one has problem
driver.find_element_by_xpath('//*[@id="node-39"]/div/div/div/div/a/h2').click()

我尝试通过xpath查找元素，通过类名查找元素。。。它们都不起作用。熟悉Selenium的人能帮我解决这个问题吗？

您遇到了什么错误？检查是否有框架。如果有任何帧，则需要先切换到帧

更新链接是否不符合HTML，但selenium会设法找到链接本身

就像上面的答案一样，你们要做的就是再等几秒钟

另一种解决方案：仅使用请求这与

selenium

无关，但此解决方案可以使您的爬虫程序比

selenium

快得多

网站的问题是链接（下载链接）不符合HTML

因为它的链接是这样的：

<a href=http://biogeo.ucdavis.edu/data/diva/adm/KOR_adm.zip>
<h2>Download</h2></a>

<a href="http://biogeo.ucdavis.edu/data/diva/adm/KOR_adm.zip">
<h2>Download</h2></a>

这段代码将不使用selenium，而是假装像Chrome一样发布请求，以便您可以获取信息

您需要做的是更改数据变量。

您在单击“确定”按钮一两毫秒后试图找到该按钮，而不给页面一个呈现的机会。您需要等待页面刷新，然后才能搜索下载按钮

例如：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

...

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.LINK_TEXT, "Download")))
    element.click()
except:
    print("unable to find the Download button after 10 seconds")

下面是一个完整的工作示例：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("http://www.diva-gis.org/datadown")
form = driver.find_element_by_tag_name("form")
form.submit()

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.LINK_TEXT, "Download")))
    element.click()
except:
    print("unable to find the Download button after 10 seconds")

driver.close()

也试试这个。我既没有使用selenium也没有使用regex来完成这项任务。执行后，您将根据所选文件夹下载所需的zip文件

import requests ; from bs4 import BeautifulSoup
from urllib.request import urlopen
from zipfile import ZipFile

base_link = "http://www.diva-gis.org/datadown"

def download_file(link):
    payload = {'cnt':'AFG_Afghanistan','thm':'adm#Administrative areas (GADM)','OK':'OK','_submit_check':'1'}

    headers={
        "Content-Type":"application/x-www-form-urlencoded",
        "User-Agent":"Mozilla/5.0"
    }

    page = requests.post(link, data=payload, headers=headers)
    soup = BeautifulSoup(page.text,"lxml")
    for item in soup.select(".field-item a"):
        zipresp = urlopen(item['href'])
        tempzip = open(r"C:\Users\ar\Desktop\mth.zip", "wb")  #change the address according to your need
        tempzip.write(zipresp.read())
        tempzip.close()

download_file(base_link)

共享您的html元素。。。并共享您正在使用的实际selenium代码和xpath或类名。谢谢，我已经添加了我的代码右键单击您的元素。如果可以看到“查看此框架”选项，则表示元素位于框架内。然后，您需要通过driver.switchTo.frame（“frame name”）方法切换到该特定帧并定位元素。我认为您的初始语句是错误的。使用selenium我可以很好地找到下载链接。好吧，那么你的代码可以正常工作了，那你为什么不使用它呢？你可以找到url（这意味着你可以找到一个标签），那么为什么不点击它呢？为什么要单击h2标记？不如点击一个标签吧？不需要点击H1元素。你需要点击锚元素。还有一件事，我最初的陈述是正确的。如果你正在查看chrome的元素标签，当然你可以用chrome本身来纠正它。但是如果你看起来是真正的HTTP响应，那么你会看到响应HTML被破坏了。也许我应该更明确一些。“因此selenium无法正确地找到标记元素”的语句是错误的，因为selenium确实可以找到它。@1988年：“不工作”不是很有用。我成功地下载了上面的代码（加上打开驱动程序并提交表单时丢失的代码）。这有点奇怪，我增加了WebDriverWait的时间，还使用time.sleep（）增加了加载页面的时间，但我仍然只能得到“10秒后找不到下载按钮”感谢分享代码。它仍然打印“10秒后找不到下载按钮”。请问您使用的Chrome和chromedriver的版本是什么？@Robi1988:我有chromedriver 2.32，chromedriver 60.0.3112.101。我发布的代码也适用于firefox（只是它弹出了一个对话框而不是下载文件）。我在ubuntu 15.04中使用了Chrome48。现在我安装了16.04并使用Chrome61，甚至我的原始代码也能正常工作。非常感谢。

import requests ; from bs4 import BeautifulSoup
from urllib.request import urlopen
from zipfile import ZipFile

base_link = "http://www.diva-gis.org/datadown"

def download_file(link):
    payload = {'cnt':'AFG_Afghanistan','thm':'adm#Administrative areas (GADM)','OK':'OK','_submit_check':'1'}

    headers={
        "Content-Type":"application/x-www-form-urlencoded",
        "User-Agent":"Mozilla/5.0"
    }

    page = requests.post(link, data=payload, headers=headers)
    soup = BeautifulSoup(page.text,"lxml")
    for item in soup.select(".field-item a"):
        zipresp = urlopen(item['href'])
        tempzip = open(r"C:\Users\ar\Desktop\mth.zip", "wb")  #change the address according to your need
        tempzip.write(zipresp.read())
        tempzip.close()

download_file(base_link)