Python 如何在应用过滤器后从网页中刮取数据表?

Python 如何在应用过滤器后从网页中刮取数据表?,python,html,pandas,selenium,Python,Html,Pandas,Selenium,我正在尝试构建一个数据刮取工具,但是,在应用必要的过滤器后,表的值会发生变化。我不知道如何使用selenium或其他工具应用过滤器 我的计划是加载基表,然后找出如何应用过滤器和改进我的代码,但即使将基表从网页上删除,我仍然被卡住了。我尝试应用的筛选器位于网站“”上标记为“Slates”的下拉工具栏上 我很有信心这段代码得到了正确的表格: from selenium import webdriver from selenium.webdriver.common.by import By driv

我正在尝试构建一个数据刮取工具,但是,在应用必要的过滤器后,表的值会发生变化。我不知道如何使用selenium或其他工具应用过滤器

我的计划是加载基表,然后找出如何应用过滤器和改进我的代码,但即使将基表从网页上删除,我仍然被卡住了。我尝试应用的筛选器位于网站“”上标记为“Slates”的下拉工具栏上

我很有信心这段代码得到了正确的表格:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
url = 'https://rotogrinders.com/projected-stats/nfl-qb?site=fanduel'
driver.get(url)
table = driver.find_element_by_xpath("//*[@id='proj-stats']")
然而,将其转换为熊猫数据帧的过程并不顺利

results_table = []
for row in table:
    temp = []
    columns = row.find_element_by_xpath("//*[@id='proj-stats']/div[1]")
    for column in columns:
        temp.append(column.text)

    results_table.append(temp)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-bdda19bc35a3> in <module>
      1 results_table = []
----> 2 for row in table:
      3     temp = []
      4     columns = row.find_element_by_xpath("//*[@id='proj-stats']/div[1]")
      5     for column in columns:

TypeError: 'WebElement' object is not iterable
results\u table=[]
对于表中的行:
温度=[]
columns=row。通过xpath(“/*[@id='proj-stats']]/div[1]”查找元素
对于列中的列:
临时追加(column.text)
结果\u表追加(临时)
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在里面
1结果_表=[]
---->2对于表中的行:
3温度=[]
4列=行。通过xpath(“/*[@id='proj-stats']]/div[1]”查找元素
5对于列中的列:
TypeError:“WebElement”对象不可编辑

如果您想获取球员姓名和薪水并加载到熊猫数据框中,请尝试下面的代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
driver = webdriver.Chrome()
url = 'https://rotogrinders.com/projected-stats/nfl-qb?site=fanduel'
driver.get(url)
table =WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//*[@id='proj-stats']")))

Player_Name = []
Player_Price=[]
for row in driver.find_elements_by_xpath(".//div[@class='player']/a"):
    Player_Name.append(row.text)

for row in driver.find_elements_by_xpath(".//div[@class='rgt-col']/div[@class='rgt-hdr'][contains(.,'Salary')]/following-sibling::div"):
    Player_Price.append(row.text)

df = pd.DataFrame({"Player Name":Player_Name,"Salary":Player_Price})
print(df)
输出

         Player Name Salary
0          Drew Brees  $7.2K
1      Deshaun Watson  $8.4K
2      Russell Wilson  $8.6K
3   Mitchell Trubisky  $6.5K
4          Josh Allen  $7.7K
5    Matthew Stafford  $7.9K
6     Jacoby Brissett  $7.3K
7       Matthew Moore  $6.5K
8        Daniel Jones  $7.0K
9        Carson Wentz  $7.4K
10      Aaron Rodgers  $8.1K
11       Kirk Cousins  $7.8K
12          Tom Brady  $7.9K
13     Jameis Winston  $7.5K
14         Jared Goff  $8.0K
15    Gardner Minshew  $6.9K
16     Ryan Tannehill  $7.1K
17        Andy Dalton  $6.9K
18      Mason Rudolph  $7.1K
19    Jimmy Garoppolo  $7.7K
20         Kyle Allen  $6.8K
21       Kyler Murray  $7.8K
22         Derek Carr  $7.3K
23        Case Keenum  $6.3K
24      Philip Rivers  $7.2K
25   Ryan Fitzpatrick  $7.0K
26         Joe Flacco  $6.5K
27        Matt Schaub  $6.6K
28        Sam Darnold  $7.3K
29     Baker Mayfield  $7.2K

这些都在json格式的
标记中。例如,您可以遍历板岩ID,并将其与这些板岩的球员和工资进行匹配:

import requests
from bs4 import BeautifulSoup
import json

url = 'https://rotogrinders.com/projected-stats/nfl-qb?site=fanduel'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

script = soup.find_all('script')[12].text
jsonStr_slate = script.split('slates:')[-1]
jsonStr_slate = jsonStr_slate.split('schedules:')
jsonStr_slate = jsonStr_slate[0].rsplit(',',1)[0]
slatesData = json.loads(jsonStr_slate)

script = soup.find_all('script')[13].text
jsonStr = script.split('data = ')[-1]
jsonStr = jsonStr.rsplit(';',4)[0]

jsonData = json.loads(jsonStr)

for each in jsonData:
    name = each['player_name']
    for slate in each['import_data']:
        slate_id = slate['slate_id']
        salary = slate['salary']

        for k, v in slatesData.items():
            if v['importId'] == slate_id:
                print ('%-20s $%-8s %s' %(name, salary, k))
输出:

Russell Wilson       $8600     8:20pm Thu-Mon
Russell Wilson       $8600     2:00pm Main
Russell Wilson       $8600     2:00pm Sun-Mon
Russell Wilson       $9500     2:00pm SuperFlex
Russell Wilson       $8600     5:05pm 4pm Only
Lamar Jackson        $8000     8:20pm Thu-Mon
Lamar Jackson        $8000     2:00pm Sun-Mon
Lamar Jackson        $8800     2:00pm SuperFlex
Mitchell Trubisky    $6500     8:20pm Thu-Mon
Mitchell Trubisky    $6500     2:00pm Main
Mitchell Trubisky    $6500     2:00pm 1pm Only
Mitchell Trubisky    $6500     2:00pm Sun-Mon
Mitchell Trubisky    $6800     2:00pm SuperFlex
Deshaun Watson       $8400     8:20pm Thu-Mon
Dak Prescott         $7800     8:20pm Thu-Mon
Dak Prescott         $7800     2:00pm Sun-Mon
Josh Allen           $7700     8:20pm Thu-Mon
Josh Allen           $7700     2:00pm Main
Josh Allen           $7700     2:00pm 1pm Only
Josh Allen           $7700     2:00pm Sun-Mon
Josh Allen           $8400     2:00pm SuperFlex
Jameis Winston       $7500     8:20pm Thu-Mon
Jameis Winston       $7500     2:00pm Main
Jameis Winston       $7500     2:00pm Sun-Mon
Jameis Winston       $8200     2:00pm SuperFlex
Jameis Winston       $7500     5:05pm 4pm Only
Jimmy Garoppolo      $15500    8:20pm SF @ ARI
Jimmy Garoppolo      $7600     8:20pm Thu-Mon
Jacoby Brissett      $7300     8:20pm Thu-Mon
Jacoby Brissett      $7300     2:00pm Main
Jacoby Brissett      $7300     2:00pm 1pm Only
Jacoby Brissett      $7300     2:00pm Sun-Mon
Jacoby Brissett      $7900     2:00pm SuperFlex
Patrick Mahomes      $8500     8:20pm Thu-Mon
Patrick Mahomes      $8500     2:00pm Main
Patrick Mahomes      $8500     2:00pm 1pm Only
Patrick Mahomes      $8500     2:00pm Sun-Mon
Patrick Mahomes      $9400     2:00pm SuperFlex
Carson Wentz         $7400     8:20pm Thu-Mon
Carson Wentz         $7400     2:00pm Main
Carson Wentz         $7400     2:00pm 1pm Only
Carson Wentz         $7400     2:00pm Sun-Mon
Carson Wentz         $8000     2:00pm SuperFlex
Aaron Rodgers        $8100     8:20pm Thu-Mon
Aaron Rodgers        $8100     2:00pm Main
Aaron Rodgers        $8100     2:00pm Sun-Mon
Aaron Rodgers        $9000     2:00pm SuperFlex
Aaron Rodgers        $8100     5:05pm 4pm Only
Derek Carr           $7300     8:20pm Thu-Mon
Derek Carr           $7300     2:00pm Main
Derek Carr           $7300     2:00pm Sun-Mon
Derek Carr           $7900     2:00pm SuperFlex
Derek Carr           $7300     5:05pm 4pm Only
Tom Brady            $7900     8:20pm Thu-Mon
Tom Brady            $7900     2:00pm Sun-Mon
Tom Brady            $8700     2:00pm SuperFlex
Kirk Cousins         $7800     8:20pm Thu-Mon
Kirk Cousins         $7800     2:00pm Main
Kirk Cousins         $7800     2:00pm 1pm Only
Kirk Cousins         $7800     2:00pm Sun-Mon
Kirk Cousins         $8500     2:00pm SuperFlex
Daniel Jones         $7300     8:20pm Thu-Mon
Daniel Jones         $7300     2:00pm Sun-Mon
Kyle Allen           $6800     8:20pm Thu-Mon
Kyle Allen           $6800     2:00pm Main
Kyle Allen           $6800     2:00pm 1pm Only
Kyle Allen           $6800     2:00pm Sun-Mon
Kyle Allen           $7200     2:00pm SuperFlex
Gardner Minshew      $7200     8:20pm Thu-Mon
Philip Rivers        $7200     8:20pm Thu-Mon
Philip Rivers        $7200     2:00pm Main
Philip Rivers        $7200     2:00pm Sun-Mon
Philip Rivers        $7700     2:00pm SuperFlex
Philip Rivers        $7200     5:05pm 4pm Only
Mason Rudolph        $6800     8:20pm Thu-Mon
Mason Rudolph        $6800     2:00pm Main
Mason Rudolph        $6800     2:00pm 1pm Only
Mason Rudolph        $6800     2:00pm Sun-Mon
Mason Rudolph        $7200     2:00pm SuperFlex
Sam Darnold          $7300     8:20pm Thu-Mon
Sam Darnold          $7300     2:00pm Main
Sam Darnold          $7300     2:00pm 1pm Only
Sam Darnold          $7300     2:00pm Sun-Mon
Sam Darnold          $7800     2:00pm SuperFlex
Matthew Stafford     $7900     8:20pm Thu-Mon
Matthew Stafford     $7900     2:00pm Main
Matthew Stafford     $7900     2:00pm Sun-Mon
Matthew Stafford     $8700     2:00pm SuperFlex
Matthew Stafford     $7900     5:05pm 4pm Only
Kyler Murray         $15000    8:20pm SF @ ARI
Kyler Murray         $7200     8:20pm Thu-Mon
Brandon Allen        $6000     8:20pm Thu-Mon
Brandon Allen        $6000     2:00pm Main
Brandon Allen        $6000     2:00pm Sun-Mon
Brandon Allen        $6200     2:00pm SuperFlex
Brandon Allen        $6000     5:05pm 4pm Only
Ryan Tannehill       $7100     8:20pm Thu-Mon
Ryan Tannehill       $7100     2:00pm Main
Ryan Tannehill       $7100     2:00pm 1pm Only
Ryan Tannehill       $7100     2:00pm Sun-Mon
Ryan Tannehill       $7500     2:00pm SuperFlex
Baker Mayfield       $7200     8:20pm Thu-Mon
Baker Mayfield       $7200     2:00pm Main
Baker Mayfield       $7200     2:00pm Sun-Mon
Baker Mayfield       $7700     2:00pm SuperFlex
Baker Mayfield       $7200     5:05pm 4pm Only
Ryan Fitzpatrick     $7000     8:20pm Thu-Mon
Ryan Fitzpatrick     $7000     2:00pm Main
Ryan Fitzpatrick     $7000     2:00pm 1pm Only
Ryan Fitzpatrick     $7000     2:00pm Sun-Mon
Ryan Fitzpatrick     $7400     2:00pm SuperFlex
Case Keenum          $6300     8:20pm Thu-Mon
Case Keenum          $6300     2:00pm Main
Case Keenum          $6300     2:00pm 1pm Only
Case Keenum          $6300     2:00pm Sun-Mon
Case Keenum          $6600     2:00pm SuperFlex

您应该使用row.find_elements_by_xpath,它应该返回一个iterable列表。find_元素将只返回一个webElement/从DOM中标识的第一个元素。感谢您提供的详细帮助。这段代码运行得很好,现在我只需要通过selenium应用过滤器。