Python 美丽的espn桌,can';找不到合适的标签,图片在
我正试图从espn网站上挖一张桌子。我只是似乎找不到合适的名字来访问它Python 美丽的espn桌,can';找不到合适的标签,图片在,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试图从espn网站上挖一张桌子。我只是似乎找不到合适的名字来访问它 代码只给了我一个空列表:(为什么不直接获取flex类,然后获取玩家表 import requests from bs4 import BeautifulSoup url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc" headers={'User-Agent': 'Mozilla/5.0'} resp
代码只给了我一个空列表:(为什么不直接获取flex类,然后获取玩家表
import requests
from bs4 import BeautifulSoup
url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url, headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')
all_tables = soup.find('div', {'class':'flex'})
all_tables.find('table') # To get all players name
为什么不直接获得flex类,然后获得玩家表
import requests
from bs4 import BeautifulSoup
url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url, headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')
all_tables = soup.find('div', {'class':'flex'})
all_tables.find('table') # To get all players name
使用以下选项选择的标记:
soup.find_all('table',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")
不应是'table'
,而应是'section'
:
soup.find_all('section',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")
要获取所有数据,可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')
for tr1, tr2 in zip(soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left tr'),
soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left ~ div tr')):
data = tr1.select('td') + tr2.select('td')
if not data:
continue
print('{:<25}'.format(data[1].get_text(strip=True, separator='-').split()[-1]), end=' ')
for td in data[2:]:
print('{:<6}'.format(td.get_text(strip=True)), end=' ')
print()
使用以下选项选择的标记:
soup.find_all('table',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")
不应是'table'
,而应是'section'
:
soup.find_all('section',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")
要获取所有数据,可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')
for tr1, tr2 in zip(soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left tr'),
soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left ~ div tr')):
data = tr1.select('td') + tr2.select('td')
if not data:
continue
print('{:<25}'.format(data[1].get_text(strip=True, separator='-').split()[-1]), end=' ')
for td in data[2:]:
print('{:<6}'.format(td.get_text(strip=True)), end=' ')
print()
您还可以使用网页用于向其表中填充播放器信息的相同API。如果您向该API发出直接GET请求(使用正确的标题和查询字符串),您将收到符合JSON格式的所有播放器信息 API的URL、相关标题和查询字符串GET参数都可以在Google Chrome的网络日志中看到(大多数现代浏览器都有类似的功能)。我可以通过应用过滤器并只保留XMLHttpRequest(XHR)资源,然后单击表底部的“显示更多”按钮来找到它们 我已将
“limit”
GET参数设置为“3”
,因为我只对打印前三名玩家的数据感兴趣。例如,将此字符串更改为“50”
,将查询前五十名玩家的API
def main():
import requests
headers = {
"accept": "application/json, text/plain, */*",
"origin": "https://www.espn.com",
"user-agent": "Mozilla/5.0"
}
params = {
"region": "us",
"lang": "en",
"contentorigin": "espn",
"isqualified": "true",
"page": "1",
"limit": "3",
"sort": "offensive.avgAssists:desc"
}
base_url = "https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/statistics/byathlete"
response = requests.get(base_url, headers=headers, params=params)
response.raise_for_status()
data = response.json()
print(data["athletes"])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
您还可以使用网页用于向其表中填充播放器信息的相同API。如果您向该API发出直接GET请求(使用正确的标题和查询字符串),您将收到符合JSON格式的所有播放器信息 API的URL、相关标题和查询字符串GET参数都可以在Google Chrome的网络日志中看到(大多数现代浏览器都有类似的功能)。我可以通过应用过滤器并只保留XMLHttpRequest(XHR)资源,然后单击表底部的“显示更多”按钮来找到它们 我已将
“limit”
GET参数设置为“3”
,因为我只对打印前三名玩家的数据感兴趣。例如,将此字符串更改为“50”
,将查询前五十名玩家的API
def main():
import requests
headers = {
"accept": "application/json, text/plain, */*",
"origin": "https://www.espn.com",
"user-agent": "Mozilla/5.0"
}
params = {
"region": "us",
"lang": "en",
"contentorigin": "espn",
"isqualified": "true",
"page": "1",
"limit": "3",
"sort": "offensive.avgAssists:desc"
}
base_url = "https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/statistics/byathlete"
response = requests.get(base_url, headers=headers, params=params)
response.raise_for_status()
data = response.json()
print(data["athletes"])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
如果你有
表格
标签,让熊猫
为你做这项工作。它在引擎盖下使用BeautifulSoup
import pandas as pd
url = "https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
dfs = pd.read_html(url)
df = dfs[0].join(dfs[1])
df[['Name','Team']] = df['Name'].str.extract('^(.*?)([A-Z]+)$', expand=True)
输出:
print(df.head(5).to_string())
RK Name POS GP MIN PTS FGM FGA FG% 3PM 3PA 3P% FTM FTA FT% REB AST STL BLK TO DD2 TD3 PER Team
0 1 LeBron James SF 35 35.1 24.9 9.6 19.7 48.6 2.0 6.0 33.8 3.7 5.5 67.7 7.9 11.0 1.3 0.5 3.7 28 9 26.10 LAL
1 2 Ricky Rubio PG 30 32.0 13.6 4.9 11.9 41.3 1.2 3.7 31.8 2.6 3.1 83.7 4.6 9.3 1.3 0.2 2.5 12 1 16.40 PHX
2 3 Luka Doncic SF 32 32.8 29.7 9.6 20.2 47.5 3.1 9.4 33.1 7.3 9.1 80.5 9.7 8.9 1.2 0.2 4.2 22 11 31.74 DAL
3 4 Ben Simmons PG 36 35.4 14.9 6.1 10.8 56.3 0.1 0.1 40.0 2.7 4.6 59.0 7.5 8.6 2.2 0.7 3.6 19 3 19.49 PHI
4 5 Trae Young PG 34 35.1 28.9 9.3 20.8 44.8 3.5 9.4 37.5 6.7 7.9 85.0 4.3 8.4 1.2 0.1 4.8 11 1 23.47 ATL
如果你有
表格
标签,让熊猫
为你做这项工作。它在引擎盖下使用BeautifulSoup
import pandas as pd
url = "https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
dfs = pd.read_html(url)
df = dfs[0].join(dfs[1])
df[['Name','Team']] = df['Name'].str.extract('^(.*?)([A-Z]+)$', expand=True)
输出:
print(df.head(5).to_string())
RK Name POS GP MIN PTS FGM FGA FG% 3PM 3PA 3P% FTM FTA FT% REB AST STL BLK TO DD2 TD3 PER Team
0 1 LeBron James SF 35 35.1 24.9 9.6 19.7 48.6 2.0 6.0 33.8 3.7 5.5 67.7 7.9 11.0 1.3 0.5 3.7 28 9 26.10 LAL
1 2 Ricky Rubio PG 30 32.0 13.6 4.9 11.9 41.3 1.2 3.7 31.8 2.6 3.1 83.7 4.6 9.3 1.3 0.2 2.5 12 1 16.40 PHX
2 3 Luka Doncic SF 32 32.8 29.7 9.6 20.2 47.5 3.1 9.4 33.1 7.3 9.1 80.5 9.7 8.9 1.2 0.2 4.2 22 11 31.74 DAL
3 4 Ben Simmons PG 36 35.4 14.9 6.1 10.8 56.3 0.1 0.1 40.0 2.7 4.6 59.0 7.5 8.6 2.2 0.7 3.6 19 3 19.49 PHI
4 5 Trae Young PG 34 35.1 28.9 9.3 20.8 44.8 3.5 9.4 37.5 6.7 7.9 85.0 4.3 8.4 1.2 0.1 4.8 11 1 23.47 ATL
谢谢,但它总是在我运行它时抛出一个异常,即使有限制:“1”什么样的异常?谢谢,但它总是在我运行它时抛出一个异常,即使有限制:“1”什么样的异常?