Python 3.x Python Selenium数据未加载（网站安全）_Python 3.x_Selenium_Drop Down Menu_Geckodriver

Python 3.x Python Selenium数据未加载（网站安全）

python-3.x selenium drop-down-menu

Python 3.x Python Selenium数据未加载（网站安全）,python-3.x,selenium,drop-down-menu,geckodriver,Python 3.x,Selenium,Drop Down Menu,Geckodriver,请在下面找到我试图下载/刮取“csv”文件的代码。代码是测试的第一个阶段，它失败了，即使没有错误--gecko驱动程序中不加载数据 from selenium import webdriver from selenium.webdriver.support.ui import Select import time driver = webdriver.Firefox(executable_path="C:\Py378\prj14\geckodriver.exe") dri

请在下面找到我试图下载/刮取“csv”文件的代码。代码是测试的第一个阶段，它失败了，即使没有错误--gecko驱动程序中不加载数据

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time

driver = webdriver.Firefox(executable_path="C:\Py378\prj14\geckodriver.exe")

driver.get("https://www.nseindia.com/market-data/live-equity-market")
time.sleep(5)

element_dorpdown = Select(driver.find_element_by_id("equitieStockSelect"))
element_dorpdown.select_by_index(44)   #Updated with help of @PDHide in the comments
time.sleep(5)

代码执行正常，但由于网站的安全设置，与该选项相关的数据不会加载，当我手动选择并更新该选项时，表不会更新，就好像没有进行选择一样。（也许它正在了解它的selenium驱动程序，并且需要标题，但不确定…）另外，当我尝试单击“在CSV中下载”时，它会给出超时

在成功选择该选项后，我需要下载F&O的csv（如上所示）。。。请帮忙

我可以在普通浏览器（已安装）上浏览网站，但当我使用python（selenium）时，它在这些浏览器上就失败了。。。请告诉我如何绕过安全机制？

我尝试过执行代码（使用Chrome，但这不重要），或者我应该说，稍微改变一下，这样我就能更好地看到发生了什么（注意，我使用

隐式地等待而不是睡眠，后者浪费时间）。这里我只是想选择第二个选项：
from selenium import webdriver
from selenium.webdriver.support.ui import Select

options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=options)

try:
    driver.implicitly_wait(3) # wait up to 3 seconds before calls to find elements time out
    driver.get("https://www.nseindia.com/market-data/live-equity-market")
    select = Select(driver.find_element_by_id("equitieStockSelect"))
    select.select_by_index(1)
finally:
    input('pausing...')
    driver.quit()

如你所见，我选择第二个选项没有问题。但是，新表无法加载：
此时，我手动在页面上重新加载，并得到以下结果。我的结论是，网站检测到浏览器正在自动运行，并阻止访问：
更新
因此，可以使用请求
检索数据。我使用Chrome inspector查看网络XHR请求，然后选择第二个选项（NIFTY NEXT 50）并观察AJAX请求的内容：

在本例中，URL为：https://www.nseindia.com/api/equity-stockIndices?index=NIFTY%20NEXT%2050
。但是，您必须首先使用请求
会话
实例获取初始页面：
import requests

try:
    s = requests.Session()
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36'}
    s.headers.update(headers)
    # You have to first retrieve the initial page:
    resp = s.get('https://www.nseindia.com/market-data/live-equity-market')
    resp.raise_for_status()
    #print(resp.text)
    resp = s.get('https://www.nseindia.com/api/equity-stockIndices?index=NIFTY%20NEXT%2050')
    resp.raise_for_status()
    data = resp.json()
    print(data)
except Exception as e:
    print(e)

印刷品：
{'name': 'NIFTY NEXT 50', 'advance': {'declines': '25', 'advances': '24', 'unchanged': '1'}, 'timestamp': '27-Nov-2020 16:00:00', 'data': [{'priority': 1, 'symbol': 'NIFTY NEXT 50', 'identifier': 'NIFTY NEXT 50', 'open': 30316.45,  etc. (data too long) }

https://www.nseindia.com/api/equity-stockIndices?index=SECURITIES+IN+F%26O

更新2
通常，要计算URL，您需要获取任何索引，例如索引44，请查看该索引的相应期权值，在本例中为“F&O中的证券”，并在以下程序中用该值替换变量期权值
：
from urllib.parse import quote_plus

option_value = 'SECURITIES IN F&O'

url = 'https://www.nseindia.com/api/equity-stockIndices?index=' + quote_plus(option_value)
print(url)

印刷品：
{'name': 'NIFTY NEXT 50', 'advance': {'declines': '25', 'advances': '24', 'unchanged': '1'}, 'timestamp': '27-Nov-2020 16:00:00', 'data': [{'priority': 1, 'symbol': 'NIFTY NEXT 50', 'identifier': 'NIFTY NEXT 50', 'open': 30316.45,  etc. (data too long) }

https://www.nseindia.com/api/equity-stockIndices?index=SECURITIES+IN+F%26O

上面的URL就是要使用的值。
我尝试执行代码（使用Chrome，但这不重要），或者我应该说，稍微改变一下，这样我可以更好地看到发生了什么（请注意，我使用隐式地等待
而不是睡眠
，后者浪费时间）。这里我只是想选择第二个选项：
from selenium import webdriver
from selenium.webdriver.support.ui import Select

options = webdriver.ChromeOptions()
driver = webdriver.Chrome(options=options)

try:
    driver.implicitly_wait(3) # wait up to 3 seconds before calls to find elements time out
    driver.get("https://www.nseindia.com/market-data/live-equity-market")
    select = Select(driver.find_element_by_id("equitieStockSelect"))
    select.select_by_index(1)
finally:
    input('pausing...')
    driver.quit()

如你所见，我选择第二个选项没有问题。但是，新表无法加载：
此时，我手动在页面上重新加载，并得到以下结果。我的结论是，网站检测到浏览器正在自动运行，并阻止访问：
更新
因此，可以使用请求
检索数据。我使用Chrome inspector查看网络XHR请求，然后选择第二个选项（NIFTY NEXT 50）并观察AJAX请求的内容：

在本例中，URL为：https://www.nseindia.com/api/equity-stockIndices?index=NIFTY%20NEXT%2050
。但是，您必须首先使用请求
会话
实例获取初始页面：
import requests

try:
    s = requests.Session()
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36'}
    s.headers.update(headers)
    # You have to first retrieve the initial page:
    resp = s.get('https://www.nseindia.com/market-data/live-equity-market')
    resp.raise_for_status()
    #print(resp.text)
    resp = s.get('https://www.nseindia.com/api/equity-stockIndices?index=NIFTY%20NEXT%2050')
    resp.raise_for_status()
    data = resp.json()
    print(data)
except Exception as e:
    print(e)

印刷品：
{'name': 'NIFTY NEXT 50', 'advance': {'declines': '25', 'advances': '24', 'unchanged': '1'}, 'timestamp': '27-Nov-2020 16:00:00', 'data': [{'priority': 1, 'symbol': 'NIFTY NEXT 50', 'identifier': 'NIFTY NEXT 50', 'open': 30316.45,  etc. (data too long) }

https://www.nseindia.com/api/equity-stockIndices?index=SECURITIES+IN+F%26O

更新2
通常，要计算URL，您需要获取任何索引，例如索引44，请查看该索引的相应期权值，在本例中为“F&O中的证券”，并在以下程序中用该值替换变量期权值
：
from urllib.parse import quote_plus

option_value = 'SECURITIES IN F&O'

url = 'https://www.nseindia.com/api/equity-stockIndices?index=' + quote_plus(option_value)
print(url)

印刷品：
{'name': 'NIFTY NEXT 50', 'advance': {'declines': '25', 'advances': '24', 'unchanged': '1'}, 'timestamp': '27-Nov-2020 16:00:00', 'data': [{'priority': 1, 'symbol': 'NIFTY NEXT 50', 'identifier': 'NIFTY NEXT 50', 'open': 30316.45,  etc. (data too long) }

https://www.nseindia.com/api/equity-stockIndices?index=SECURITIES+IN+F%26O

上面的URL是要使用的值。
添加htmldom@PDHide，谢谢您的回复。。。但我不知道该怎么做，我只是在学习这个。。如果您能帮上忙，请复制您在按f12键时得到的信息，选择计算器仅适用于选择tag@PDHide，好的，据我从你的建议中了解，我已经以这种方式更新了代码；。。。。element_dorpdown=Select（驱动程序。通过类名称（“无边界半径”）查找元素）。。。但它仍然没有更新选项。。。（可能是检测selenium驱动程序，需要标题，但不确定..只是猜测）@PDHide，当我尝试更新选项时，甚至当手动从页面下载csv时，页面会超时（不使用python）。。您确定您正在尝试selenium，因为在获取网页的第一步添加html后，我的页面无法更新吗dom@PDHide，谢谢您的回复。。。但我不知道该怎么做，我只是在学习这个。。如果您能帮上忙，请复制您在按f12键时得到的信息，选择计算器仅适用于选择tag@PDHide，好的，据我从你的建议中了解，我已经以这种方式更新了代码；。。。。element_dorpdown=Select（驱动程序。通过类名称（“无边界半径”）查找元素）。。。但它仍然没有更新选项。。。（可能是检测selenium驱动程序，需要标题，但不确定..只是猜测）@PDHide，当我尝试更新选项时，甚至当手动从页面下载csv时，页面会超时（不使用python）。。你确定你在尝试selenium吗，因为我的页面在第一步获取WebGetHanks@Booboo后就无法更新，是的，我在几分钟前更新了页面/问题。。。我可以选择更改，但没有关于如何绕过网页安全的在线帮助。。。我在这方面找不到任何blog/selenium模块…我不知道如何使用标准python库的请求库。。。我不确定这是否会有办法绕过同样的安全措施