Python 美丽的espn桌，can'；找不到合适的标签，图片在_Python_Web Scraping_Beautifulsoup

Python 美丽的espn桌，can'；找不到合适的标签，图片在

python web-scraping

Python 美丽的espn桌，can'；找不到合适的标签，图片在,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试图从espn网站上挖一张桌子。我只是似乎找不到合适的名字来访问它代码只给了我一个空列表：（为什么不直接获取flex类，然后获取玩家表 import requests from bs4 import BeautifulSoup url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc" headers={'User-Agent': 'Mozilla/5.0'} resp

我正试图从espn网站上挖一张桌子。我只是似乎找不到合适的名字来访问它

代码只给了我一个空列表：（

为什么不直接获取flex类，然后获取玩家表

import requests
from bs4 import BeautifulSoup

url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"


headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url, headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')

all_tables = soup.find('div', {'class':'flex'})
all_tables.find('table') # To get all players name

为什么不直接获得flex类，然后获得玩家表

import requests
from bs4 import BeautifulSoup

url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"


headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url, headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')

all_tables = soup.find('div', {'class':'flex'})
all_tables.find('table') # To get all players name

使用以下选项选择的标记：

soup.find_all('table',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")

不应是

'table'

，而应是

'section'

：

soup.find_all('section',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")

要获取所有数据，可以使用以下示例：

import requests
from bs4 import BeautifulSoup

url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')

for tr1, tr2 in zip(soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left tr'),
                    soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left ~ div tr')):
    data = tr1.select('td') + tr2.select('td')
    if not data:
        continue
    print('{:<25}'.format(data[1].get_text(strip=True, separator='-').split()[-1]), end=' ')
    for td in data[2:]:
        print('{:<6}'.format(td.get_text(strip=True)), end=' ')
    print()

使用以下选项选择的标记：

soup.find_all('table',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")

不应是

'table'

，而应是

'section'

：

soup.find_all('section',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")

要获取所有数据，可以使用以下示例：

import requests
from bs4 import BeautifulSoup

url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')

for tr1, tr2 in zip(soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left tr'),
                    soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left ~ div tr')):
    data = tr1.select('td') + tr2.select('td')
    if not data:
        continue
    print('{:<25}'.format(data[1].get_text(strip=True, separator='-').split()[-1]), end=' ')
    for td in data[2:]:
        print('{:<6}'.format(td.get_text(strip=True)), end=' ')
    print()

您还可以使用网页用于向其表中填充播放器信息的相同API。如果您向该API发出直接GET请求（使用正确的标题和查询字符串），您将收到符合JSON格式的所有播放器信息

API的URL、相关标题和查询字符串GET参数都可以在Google Chrome的网络日志中看到（大多数现代浏览器都有类似的功能）。我可以通过应用过滤器并只保留XMLHttpRequest（XHR）资源，然后单击表底部的“显示更多”按钮来找到它们

我已将

“limit”

GET参数设置为

“3”

，因为我只对打印前三名玩家的数据感兴趣。例如，将此字符串更改为

“50”

，将查询前五十名玩家的API

def main():

    import requests

    headers = {
        "accept": "application/json, text/plain, */*",
        "origin": "https://www.espn.com",
        "user-agent": "Mozilla/5.0"
    }

    params = {
        "region": "us",
        "lang": "en",
        "contentorigin": "espn",
        "isqualified": "true",
        "page": "1",
        "limit": "3",
        "sort": "offensive.avgAssists:desc"
    }

    base_url = "https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/statistics/byathlete"

    response = requests.get(base_url, headers=headers, params=params)
    response.raise_for_status()

    data = response.json()
    print(data["athletes"])

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

我已将

“limit”

GET参数设置为

“3”

，因为我只对打印前三名玩家的数据感兴趣。例如，将此字符串更改为

“50”

，将查询前五十名玩家的API

def main():

    import requests

    headers = {
        "accept": "application/json, text/plain, */*",
        "origin": "https://www.espn.com",
        "user-agent": "Mozilla/5.0"
    }

    params = {
        "region": "us",
        "lang": "en",
        "contentorigin": "espn",
        "isqualified": "true",
        "page": "1",
        "limit": "3",
        "sort": "offensive.avgAssists:desc"
    }

    base_url = "https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/statistics/byathlete"

    response = requests.get(base_url, headers=headers, params=params)
    response.raise_for_status()

    data = response.json()
    print(data["athletes"])

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

如果你有

表格

标签，让

熊猫

为你做这项工作。它在引擎盖下使用BeautifulSoup

import pandas as pd

url = "https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"

dfs = pd.read_html(url)

df = dfs[0].join(dfs[1])
df[['Name','Team']] = df['Name'].str.extract('^(.*?)([A-Z]+)$', expand=True)

输出：

print(df.head(5).to_string())
   RK          Name POS  GP   MIN   PTS  FGM   FGA   FG%  3PM  3PA   3P%  FTM  FTA   FT%  REB   AST  STL  BLK   TO  DD2  TD3    PER Team
0   1  LeBron James  SF  35  35.1  24.9  9.6  19.7  48.6  2.0  6.0  33.8  3.7  5.5  67.7  7.9  11.0  1.3  0.5  3.7   28    9  26.10  LAL
1   2   Ricky Rubio  PG  30  32.0  13.6  4.9  11.9  41.3  1.2  3.7  31.8  2.6  3.1  83.7  4.6   9.3  1.3  0.2  2.5   12    1  16.40  PHX
2   3   Luka Doncic  SF  32  32.8  29.7  9.6  20.2  47.5  3.1  9.4  33.1  7.3  9.1  80.5  9.7   8.9  1.2  0.2  4.2   22   11  31.74  DAL
3   4   Ben Simmons  PG  36  35.4  14.9  6.1  10.8  56.3  0.1  0.1  40.0  2.7  4.6  59.0  7.5   8.6  2.2  0.7  3.6   19    3  19.49  PHI
4   5    Trae Young  PG  34  35.1  28.9  9.3  20.8  44.8  3.5  9.4  37.5  6.7  7.9  85.0  4.3   8.4  1.2  0.1  4.8   11    1  23.47  ATL

如果你有

表格

标签，让

熊猫

为你做这项工作。它在引擎盖下使用BeautifulSoup

import pandas as pd

url = "https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"

dfs = pd.read_html(url)

df = dfs[0].join(dfs[1])
df[['Name','Team']] = df['Name'].str.extract('^(.*?)([A-Z]+)$', expand=True)

输出：

print(df.head(5).to_string())
   RK          Name POS  GP   MIN   PTS  FGM   FGA   FG%  3PM  3PA   3P%  FTM  FTA   FT%  REB   AST  STL  BLK   TO  DD2  TD3    PER Team
0   1  LeBron James  SF  35  35.1  24.9  9.6  19.7  48.6  2.0  6.0  33.8  3.7  5.5  67.7  7.9  11.0  1.3  0.5  3.7   28    9  26.10  LAL
1   2   Ricky Rubio  PG  30  32.0  13.6  4.9  11.9  41.3  1.2  3.7  31.8  2.6  3.1  83.7  4.6   9.3  1.3  0.2  2.5   12    1  16.40  PHX
2   3   Luka Doncic  SF  32  32.8  29.7  9.6  20.2  47.5  3.1  9.4  33.1  7.3  9.1  80.5  9.7   8.9  1.2  0.2  4.2   22   11  31.74  DAL
3   4   Ben Simmons  PG  36  35.4  14.9  6.1  10.8  56.3  0.1  0.1  40.0  2.7  4.6  59.0  7.5   8.6  2.2  0.7  3.6   19    3  19.49  PHI
4   5    Trae Young  PG  34  35.1  28.9  9.3  20.8  44.8  3.5  9.4  37.5  6.7  7.9  85.0  4.3   8.4  1.2  0.1  4.8   11    1  23.47  ATL

谢谢，但它总是在我运行它时抛出一个异常，即使有限制：“1”什么样的异常？谢谢，但它总是在我运行它时抛出一个异常，即使有限制：“1”什么样的异常？