Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/91.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/actionscript-3/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 淘汰主队_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 淘汰主队

Python 淘汰主队,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我正在做一个项目,我想搜集2019/20赛季从10月到8月的NBA比赛统计数据 我只关注主队和客场球队的比赛结果,而不是球员/球队的具体统计数据,因此我需要使用“基本方框分数统计”表获得每场比赛的方框分数数据 问题:在抓取禁区得分时,我只收集客队的数据,因为这是禁区得分链接中的第一个表,我只需使用索引[0]指定该表(它是静态的)。对于主队来说,表索引似乎会根据是否有随时间变化(OT)而变化,有时还会由于其他未指定的变化(这有点动态) 问题:如何最好地使用循环来收集客场和主队每个月的方块分数?或者

我正在做一个项目,我想搜集2019/20赛季从10月到8月的NBA比赛统计数据

我只关注主队客场球队的比赛结果,而不是球员/球队的具体统计数据,因此我需要使用“基本方框分数统计”表获得每场比赛的方框分数数据

问题:在抓取禁区得分时,我只收集客队的数据,因为这是禁区得分链接中的第一个表,我只需使用索引[0]指定该表(它是静态的)。对于主队来说,表索引似乎会根据是否有随时间变化(OT)而变化,有时还会由于其他未指定的变化(这有点动态)

问题:如何最好地使用循环来收集客场和主队每个月的方块分数?或者,我如何为主队收集每个框得分的数据

一段时间内不带的比赛的方框得分页面示例:

随着时间的推移,与进行比赛的框得分页面示例:

在后一个示例中,主队的表索引根据前面的表数(包含数据的表,如随时间变化等)而变化。通常是第八张没有加班的桌子,而有加班的桌子则不同

我成功(且一致)获取客场球队数据的代码如下:

box_score_example_url='http://www.basketball-reference.com//boxscores/201910230POR.html'
dfbox[]
for eachBox in box_score_example_url:
    dfz = pd.read_html(eachBox)
    dfbox.append(dfz[0])
    
boxbox_awayteam = pd.concat(dfbox)
boxbox_awayteam

我没有这个想法,因为在HTML代码中似乎没有任何表具有特定的id或类。这是我的第一个网页抓取项目,也是我在Stackoverflow上提出的第一个问题,让我一目了然。

你可以使用BeautifulSoup和CSS选择器
[id$=“-game basic”]table
只选择两个基本表,然后用
pd.read\u html()加载这些表:

印刷品:

                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
https://www.basketball-reference.com/leagues/NBA_2020_games-october.html
https://www.basketball-reference.com/boxscores/201910220TOR.html
                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
--------------------------------------------------------------------------------
https://www.basketball-reference.com/boxscores/201910220LAC.html
                    Starters            MP  ...           PTS           +/-
0              Anthony Davis         37:22  ...            25            +3
1               LeBron James         36:00  ...            18            -8
2                Danny Green         32:20  ...            28            +7


...and so on.

编辑:要将此函数放入循环中,可以使用以下示例:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/leagues/NBA_2020_games.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

def get_tables(url):
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')

    my_tables = soup.select('[id$="-game-basic"] table')

    df_1 = pd.read_html(str(my_tables[0]))[0].droplevel(0, axis=1)
    df_2 = pd.read_html(str(my_tables[1]))[0].droplevel(0, axis=1)

    return df_1, df_2

for a in soup.select('.filter a'):
    u = 'https://www.basketball-reference.com' + a['href']
    print(u)
    soup2 = BeautifulSoup(requests.get(u).content, 'html.parser')
    for a2 in soup2.select('td a[href^="/boxscores/"]'):
        u2 = 'https://www.basketball-reference.com' + a2['href']
        t1, t2 = get_tables(u2)
        print(u2)
        print(t1)
        print(t2)
        print('-' * 80)
印刷品:

                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
https://www.basketball-reference.com/leagues/NBA_2020_games-october.html
https://www.basketball-reference.com/boxscores/201910220TOR.html
                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
--------------------------------------------------------------------------------
https://www.basketball-reference.com/boxscores/201910220LAC.html
                    Starters            MP  ...           PTS           +/-
0              Anthony Davis         37:22  ...            25            +3
1               LeBron James         36:00  ...            18            -8
2                Danny Green         32:20  ...            28            +7


...and so on.

您可以使用BeautifulSoup和CSS选择器
[id$=“-game basic”]表
仅选择两个基本表,然后使用
pd.read_html()加载这些表

印刷品:

                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
https://www.basketball-reference.com/leagues/NBA_2020_games-october.html
https://www.basketball-reference.com/boxscores/201910220TOR.html
                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
--------------------------------------------------------------------------------
https://www.basketball-reference.com/boxscores/201910220LAC.html
                    Starters            MP  ...           PTS           +/-
0              Anthony Davis         37:22  ...            25            +3
1               LeBron James         36:00  ...            18            -8
2                Danny Green         32:20  ...            28            +7


...and so on.

编辑:要将此函数放入循环中,可以使用以下示例:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/leagues/NBA_2020_games.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

def get_tables(url):
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')

    my_tables = soup.select('[id$="-game-basic"] table')

    df_1 = pd.read_html(str(my_tables[0]))[0].droplevel(0, axis=1)
    df_2 = pd.read_html(str(my_tables[1]))[0].droplevel(0, axis=1)

    return df_1, df_2

for a in soup.select('.filter a'):
    u = 'https://www.basketball-reference.com' + a['href']
    print(u)
    soup2 = BeautifulSoup(requests.get(u).content, 'html.parser')
    for a2 in soup2.select('td a[href^="/boxscores/"]'):
        u2 = 'https://www.basketball-reference.com' + a2['href']
        t1, t2 = get_tables(u2)
        print(u2)
        print(t1)
        print(t2)
        print('-' * 80)
印刷品:

                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
https://www.basketball-reference.com/leagues/NBA_2020_games-october.html
https://www.basketball-reference.com/boxscores/201910220TOR.html
                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
--------------------------------------------------------------------------------
https://www.basketball-reference.com/boxscores/201910220LAC.html
                    Starters            MP  ...           PTS           +/-
0              Anthony Davis         37:22  ...            25            +3
1               LeBron James         36:00  ...            18            -8
2                Danny Green         32:20  ...            28            +7


...and so on.

非常感谢!这是一个巨大的步骤,从我在那里,但你能告诉我如何使循环,将迭代通过所有的链接在一个给定的月份?我有一个列表,其中包含8月份所有框分数的链接,如何将您建议的代码应用到循环中?非常感谢!这是一个巨大的步骤,从我在那里,但你能告诉我如何使循环,将迭代通过所有的链接在一个给定的月份?我有一个列表,其中包含8月份所有框分数的链接,如何将您建议的代码应用于循环?