Python 如何使用Selenium刮取更新的HTML表？_Python_Selenium_Selenium Webdriver_Web Scraping_Geckodriver

Python 如何使用Selenium刮取更新的HTML表？

python selenium selenium-webdriver web-scraping

Python 如何使用Selenium刮取更新的HTML表？,python,selenium,selenium-webdriver,web-scraping,geckodriver,Python,Selenium,Selenium Webdriver,Web Scraping,Geckodriver,我正在寻找刮硬币表，并创建一个CSV文件日期。对于每个新硬币更新，应在现有数据文件中创建顶部的新条目期望输出 Coin,Pings,...Datetime BTC,25,...07:17:05 03/18/21 我还没到很远的地方，但下面是我的尝试 from selenium import webdriver import numpy as np import pandas as pd firefox = webdriver.Firefox(executable_path="/

我正在寻找刮硬币表，并创建一个CSV文件日期。对于每个新硬币更新，应在现有数据文件中创建顶部的新条目

期望输出

Coin,Pings,...Datetime

BTC,25,...07:17:05 03/18/21

我还没到很远的地方，但下面是我的尝试

from selenium import webdriver
import numpy as np
import pandas as pd

firefox = webdriver.Firefox(executable_path="/usr/local/bin/geckodriver")
firefox.get('https://agile-cliffs-23967.herokuapp.com/binance/')

rows = len(firefox.find_elements_by_xpath("/html/body/div/section[2]/div/div/div/div/table/tr"))
columns = len(firefox.find_elements_by_xpath("/html/body/div/section[2]/div/div/div/div/table/tr[1]/th"))

df = pd.DataFrame(columns=['Coin','Pings','Net Vol BTC','Net Vol per','Recent Total Vol BTC', 'Recent Vol per', 'Recent Net Vol', 'Datetime'])

for r in range(1, rows+1):
    for c in range(1, columns+1): 
        value = firefox.find_element_by_xpath("/html/body/div/section[2]/div/div/div/div/table/tr["+str(r)+"]/th["+str(c)+"]").text
        print(value)
        
#         df.loc[i, ['Coin']] =

通过将行数据放入字典，可以将行数据附加到数据帧：

# We reuse the headers when building dicts below
headers = ['Coin','Pings','Net Vol BTC','Net Vol per','Recent Total Vol BTC', 'Recent Vol per', 'Recent Net Vol', 'Datetime']
df = pd.DataFrame(columns=headers)

for r in range(1, rows+1):
    data = [firefox.find_element_by_xpath("/html/body/div/section[2]/div/div/div/div/table/tr["+str(r)+"]/th["+str(c)+"]").text \
                for c in range(1, columns+1)]
    row_dict = dict(zip(headers, data))
    df = df.append(row_dict, ignore_index=True)

由于数据是动态加载的，因此您可以直接从源代码检索数据，无需

Selenium

。它将返回json，其中包含需要拆分的带

分隔值的行，这些行可以附加到

数据帧中。由于站点每分钟更新一次，因此您可以将所有内容包装在一个中，而True则可以运行代码：
谢谢这正是我要找的！不客气。请注意，重新运行代码时，csv
将被覆盖。如果您想继续使用现有的csv
文件，您可以在代码开头用df=pd加载它。read_csv（'filename.csv'）感谢您提供了一个有用的解决方案。没问题。由于您是StackOverflow（SO）新手，请查看以了解SO的做事方式。特别是，如果你解决了你的问题，你可以考虑接受@ J·J·阿德里安森的回答。
import requests
import time
import json

headers = ['Coin','Pings','Net Vol BTC','Net Vol %','Recent Total Vol BTC', 'Recent Vol %', 'Recent Net Vol', 'Datetime (UTC)']
df = pd.DataFrame(columns=headers)

s = requests.Session()
starttime = time.time()

while True:
    response = s.get('https://agile-cliffs-23967.herokuapp.com/ok', headers={'Connection': 'keep-alive'})
    d = json.loads(response.text)
    rows = [str(i).split('|') for i in d['resu'][:-1]]
    if rows:
        data = [dict(zip(headers, l)) for l in rows]
        df = df.append(data, ignore_index=True)
        df.to_csv('filename.csv', index=False)
    time.sleep(60.0 - ((time.time() - starttime) % 60.0))