Selenium PythonWebScraping在循环中失败，但在我手动操作时工作_Selenium_Web Scraping_Python 3.6

Selenium PythonWebScraping在循环中失败，但在我手动操作时工作

selenium web-scraping

Selenium PythonWebScraping在循环中失败，但在我手动操作时工作,selenium,web-scraping,python-3.6,Selenium,Web Scraping,Python 3.6,我试图通过编程从网络上收集6000只股票的数据，我使用了Python3.6SeleniumWebDriver Firefox。[我打算使用BeautifulSoup解析HTML，但似乎每次我更新web时，链接都不会改变，soup无法处理Javascript] 无论如何，当我创建for循环来执行此操作时，我的代码中的一个特定行，share\u price=driver.find\u element\u by\u css\u选择器（“.highcharts root>g:nth child（25）>

我试图通过编程从网络上收集6000只股票的数据，我使用了Python3.6SeleniumWebDriver Firefox。[我打算使用BeautifulSoup解析HTML，但似乎每次我更新web时，链接都不会改变，soup无法处理Javascript]

无论如何，当我创建for循环来执行此操作时，我的代码中的一个特定行，

share\u price=driver.find\u element\u by\u css\u选择器（“.highcharts root>g:nth child（25）>text:nth child（2）”）

，在大多数情况下都会出错（虽然它工作了几次，所以我相信我的代码是好的）。但是，如果我手动操作（复制并粘贴到pythonidle中并运行它），它工作得很好。我尝试使用

time.sleep（4）

在我从后台恢复任何内容之前允许加载web，但这似乎不是解决方案。现在我没什么线索了。谁能帮我解开这个谜团

下面是我的代码：

 from selenium import webdriver
 import time
 import pyautogui
 filename = "historical_price_marketcap.csv"
 f = open(filename,"w")
 headers = "stock_ticker, share_price, market_cap\n"
 f.write(headers)
 driver = webdriver.Firefox()
 def get_web():
     driver.get("https://stockrow.com")
 import csv
 with open("TICKER.csv") as file:
        read = csv.reader(file)
        TICKER=[]
        for row in read:
                ticker = row[0][1:-1]
                TICKER.append(ticker)
for Ticker in range(len(TICKER)):
    get_web()
    time.sleep(3)
    pyautogui.click(425, 337)
    pyautogui.typewrite(TICKER[Ticker],0.25)
    time.sleep(2)
    pyautogui.press("enter")
    time.sleep(2)
    pyautogui.click(268, 337)
    pyautogui.press("backspace")
    time.sleep(2)
    pyautogui.typewrite('Stock Price',0.25)
    time.sleep(2)
    pyautogui.press("enter")
    time.sleep(2)

    pyautogui.click(702, 427)
    for i in range(int(10)):
            pyautogui.press("backspace")
    time.sleep(2)
    pyautogui.typewrite("2013-12-01",0.25)
    pyautogui.press("enter")
    time.sleep(2)

    pyautogui.click(882, 425)
    for k in range(10):
            pyautogui.press("backspace")
    time.sleep(2)
    pyautogui.typewrite("2013-12-31",0.25)
    pyautogui.press("enter")
    time.sleep(2)

    pyautogui.click(1317, 318)
    for j in range(3):
            pyautogui.press("down")

    time.sleep(10)
    share_price = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(25) > text:nth-child(2)")
    get_web()
    time.sleep(3)
    pyautogui.click(425, 337)
    pyautogui.typewrite(TICKER[Ticker],0.25)
    time.sleep(2)
    pyautogui.press("enter")
    time.sleep(2)
    pyautogui.click(268, 337)
    pyautogui.press("backspace")
    time.sleep(2)
    pyautogui.typewrite('Market Cap',0.25)
    time.sleep(2)
    pyautogui.press("enter")
    time.sleep(2)

    pyautogui.click(702, 427)
    for i in range(int(10)):
            pyautogui.press("backspace")
    time.sleep(2)
    pyautogui.typewrite("2013-12-01",0.25)
    pyautogui.press("enter")
    time.sleep(2)

    pyautogui.click(882, 425)
    for k in range(10):
            pyautogui.press("backspace")
    time.sleep(2)
    pyautogui.typewrite("2013-12-31",0.25)
    pyautogui.press("enter")
    time.sleep(2)

    pyautogui.click(1317, 318)
    for j in range(3):
            pyautogui.press("down")

    time.sleep(10)
    market_cap = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(28) > text:nth-child(2)")
 f.close()

似乎困扰我的两行代码是

share\u price=driver.find\u element\u by\u css\u选择器（“.highcharts root>g:nth child（25）>text:nth child（2）”）

下面是Python的错误消息：

 Traceback (most recent call last):
  File "C:\Users\HENGBIN\Desktop\get_historical_data.py", line 65, in <module>
    share_price = driver.find_element_by_css_selector(".highcharts-root > g:nth-child(25) > text:nth-child(2)")
  File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 457, in find_element_by_css_selector
    return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
  File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 791, in find_element
    'value': value})['value']
  File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 256, in execute
    self.error_handler.check_response(response)
  File "E:\Program Files\python3.6.1\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .highcharts-root > g:nth-child(25) > text:nth-child(2)

回溯（最近一次呼叫最后一次）：
文件“C:\Users\HENGBIN\Desktop\get\u historical\u data.py”，第65行，在
share\u price=driver.find\u element\u by\u css\u选择器（“.highcharts root>g:nth child（25）>text:nth child（2）”）
文件“E:\Program Files\python3.6.1\lib\site packages\selenium\webdriver\remote\webdriver.py”，第457行，按css选择器查找元素
返回self.find_元素（by=by.CSS_选择器，value=CSS_选择器）
文件“E:\Program Files\python3.6.1\lib\site packages\selenium\webdriver\remote\webdriver.py”，第791行，在find\u元素中
'value'：value}）['value']
文件“E:\Program Files\python3.6.1\lib\site packages\selenium\webdriver\remote\webdriver.py”，第256行，执行
self.error\u handler.check\u响应（响应）
文件“E:\Program Files\python3.6.1\lib\site packages\selenium\webdriver\remote\errorhandler.py”，第194行，在check\u响应中
引发异常类（消息、屏幕、堆栈跟踪）
selenium.common.exceptions.NoSuchElementException:消息：无法定位元素：。highcharts根目录>g:n子目录（25）>文本：n子目录（2）

它在循环中大部分时间都不工作，但如果我在pythonidle中手动运行它，它就可以正常工作。我不知道发生了什么………

在你的脚本中有几件事我会做得不同。首先，尝试摆脱pyautogui。Selenium具有用于单击（签出）和发送各种键（签出）的内置功能。另外，当您更改浏览器中的内容时（使用pyautogui），我的经验是selenium并不总是知道这些更改。这可以解释您在使用selenium搜索pyautogui创建的元素时遇到的问题

其次：get_web（）-函数可能会导致问题。一般来说，函数内的内容必须返回（或声明为全局）才能在函数外访问。打开网页的驱动程序是全局的（您在函数外部实例化它），但函数内部的url是本地的，这意味着您在访问函数外部的内容时可能会遇到问题。我建议您放弃该函数（因为它除了打开url之外实际上什么都不做），只需在代码中替换函数调用，如下所示：

for Ticker in range(len(TICKER)):
    driver.get("https://stockrow.com")
    time.sleep(3)
    # insert keys, click and so on...

这将使您能够使用seleniums驱动程序。查找元素…-方法

第三：我假设您也希望从站点提取一些数据。如果是这样，请使用selenium以外的其他工具进行解析。Selenium是一个缓慢的解析器。你可以试试BeautifulSoup

加载站点后，您将在BeautifulSoup中加载html并提取您想要的内容（有一个，它将向您展示如何进行此操作）

但是对于这个站点，您真正应该做的是利用该站点自己进行的api调用。使用色度检测工具。您将看到它查询了三个API，您可以直接调用它们，从而避免了整个selenium问题

apple的url如下所示：

url = 'https://stockrow.com/api/fundamentals.json?indicators[]=0&tickers[]=APPL'

因此，使用requests库，您可以检索json格式的内容，如下所示：

import requests
from pprint import pprint
url = 'https://stockrow.com/api/fundamentals.json?indicators[]=0&tickers[]=AAPL'
response = requests.get(url).json()
pprint(response)

这是一个比selenium快得多的解决方案。

beautifulsoup可能不是一个好的选择，因为web正在使用JavaScript。

import requests
from pprint import pprint
url = 'https://stockrow.com/api/fundamentals.json?indicators[]=0&tickers[]=AAPL'
response = requests.get(url).json()
pprint(response)