Javascript 无法使用selenium浏览器模拟从scihub下载研究文章

Javascript 无法使用selenium浏览器模拟从scihub下载研究文章,javascript,python-3.x,selenium-chromedriver,cross-site,Javascript,Python 3.x,Selenium Chromedriver,Cross Site,我正在尝试根据相应的文章标题从scihub()自动下载研究文章。我正在使用一个名为scholarly()的库来获取url,即与给定文章标题相关的作者信息,如下面的代码所示 我使用获取的url(如上所述)模拟使用scihub的下载过程。但我无法直接下载,因为我无法按搜索页()上的“打开”按钮。填充查询后按enter键,我将转到另一个带有打开按钮的页面。由于某种原因,我无法获取并按下open按钮,并且它总是使用selenium库返回空元素 但是,我能够在浏览器控制台中执行以下操作并成功下载pape

我正在尝试根据相应的文章标题从scihub()自动下载研究文章。我正在使用一个名为scholarly()的库来获取url,即与给定文章标题相关的作者信息,如下面的代码所示

我使用获取的url(如上所述)模拟使用scihub的下载过程。但我无法直接下载,因为我无法按搜索页()上的“打开”按钮。填充查询后按enter键,我将转到另一个带有打开按钮的页面。由于某种原因,我无法获取并按下open按钮,并且它总是使用selenium库返回空元素

但是,我能够在浏览器控制台中执行以下操作并成功下载pape

document.querySelector(“打开按钮”)。单击()

但是,试图从硒中获得类似的反应是失败的

请帮我解决这个问题

## This part of code fetches url using scholarly library from google scholar
from scholarly import scholarly
search_query = scholarly.search_pubs('Hydrogen-hydrogen pair correlation function in liquid water')
search_query = [query for query in search_query][0]


## This part of code uses selenium to automate download process
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
import time

download_dir = '/Users/cacsag4/Downloads'

# setup the browser
options = webdriver.ChromeOptions()

options.add_experimental_option('prefs', {
    "download.default_directory": download_dir, #Change default directory for downloads
    "download.prompt_for_download": False, #To auto download the file
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})

browser = webdriver.Chrome('./chromedriver', options=options)
browser.delete_all_cookies()

browser.get('https://sci-hub.scihubtw.tw/')

# Find the search element to send the url string to it
searchElem = browser.find_element(By.CSS_SELECTOR, 'input[type="textbox"]')
searchElem.send_keys(search_query.bib['url'])

# Emulate pressing enter two different ways, either by pressing return key or by executing JS
#searchElem.send_keys(Keys.ENTER) # This produces the same effect as the next line
browser.execute_script("javascript:document.forms[0].submit()")

# Wait for page to load
time.sleep(10)

# Try to press the open button using JS or by fetching the button by its ID

# This returns error since its unable to fetch open-button id
browser.execute_script('javascript:document.querySelector("#open-button").click()')

#openElem = browser.find_element(By.ID, "open-button") ## This also returns a null element


好的,我得到了这个问题的答案。Sci hub将其pdf存储在iframe中,因此您所要做的就是在第一页上按enter键后获取iframe的src属性。下面的代码完成了这项工作

from scholarly import scholarly
search_query = scholarly.search_pubs('Hydrogen-hydrogen pair correlation function in liquid water')
search_query = [query for query in search_query][0]
print(search_query.bib['url'])


from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
import time

download_dir = '/Users/cacsag4/Downloads'

# setup the browser
options = webdriver.ChromeOptions()

options.add_experimental_option('prefs', {
    "download.default_directory": download_dir, #Change default directory for downloads
    "download.prompt_for_download": False, #To auto download the file
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})

browser = webdriver.Chrome('./chromedriver', options=options)
browser.delete_all_cookies()

browser.get('https://sci-hub.scihubtw.tw/')

# Find the search element to send the url string to it
searchElem = browser.find_element(By.CSS_SELECTOR, 'input[type="textbox"]')
searchElem.send_keys(search_query.bib['url'])
# Emulate pressing enter two different ways, either by pressing return key or by executing JS
#searchElem.send_keys(Keys.ENTER) # This produces the same effect as the next line
browser.execute_script("javascript:document.forms[0].submit()")

# Wait for page to load
time.sleep(2)

# Try to press the open button using JS or by fetching the button by its ID

# This returns error since its unable to fetch open-button id
#browser.execute_script('javascript:document.querySelector("#open-button").click()')

openElem = browser.find_element(By.CSS_SELECTOR, "iframe") ## This also returns a null element
browser.get(openElem.get_attribute('src'))