试图让python程序打印出从web抓取中选择的统计信息
我对beautiful soup是个新手,我一直在寻找一种方法,让用户输入他们想要的团队和他们每周的工作内容。然后让脚本打印出该周的某些统计数据。在输出中,当我输入team和week number时,它直接进入命令行 这是我的密码:试图让python程序打印出从web抓取中选择的统计信息,python,beautifulsoup,Python,Beautifulsoup,我对beautiful soup是个新手,我一直在寻找一种方法,让用户输入他们想要的团队和他们每周的工作内容。然后让脚本打印出该周的某些统计数据。在输出中,当我输入team和week number时,它直接进入命令行 这是我的密码: import requests from bs4 import BeautifulSoup team = input('''What team are you looking for? crd - Arizona Cardinals atl
import requests
from bs4 import BeautifulSoup
team = input('''What team are you looking for?
crd - Arizona Cardinals
atl - Atlanta Falcons
rav - Baltimore Ravens
buf - Buffalo Bills
car - Carolina Panthers
chi - Chicago Bears
cin - Cincinnati Bengals
cle - Cleveland Browns
dal - Dallas Cowboys
den - Denver Broncos
det - Detroit Lions
gnb - Green Bay Packers
htx - Houston Texans
clt - Indianapolis Colts
jax - Jacksonville Jaguars
kan - Kansas City Chiefs
sdg - Los Angeles Chargers
ram - Los Angeles Rams
mia - Miami Dolphins
min - Minnesota Vikings
nwe - New England Patriots
nor - New Orleans Saints
nyg - New York Giants
nyj - New York Jets
rai - Oakland Raiders
phi - Philadelphia Eagles
pit - Pittsburgh Steelers
sfo - San Fransisco 49ers
sea - Seattle Seahawks
tam - Tampa Bay Buccaneers
oti - Tennessee Titans
was - Washington Football Team
Enter the 3 letter code for the team: ''')
week = int(input('What week are you looking for? '))
url = 'https://www.pro-football-reference.com/teams/' + team.lower() + '/2019.htm'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
week_num = soup.find_all('th', attrs={"data-stat": "week_num", "class": "right", "scope": "row"})
total_off = soup.find_all('td', attrs={"data-stat": "yards_off", "class": "right"})
total_def = soup.find_all('td', attrs={"data-stat": "yards_def", "class": "right"})
pass_yards_off = soup.find_all('td', attrs={"data-stat": "pass_yds_off", "class": "right"})
pass_yards_def = soup.find_all('td', attrs={"data-stat": "pass_yds_def", "class": "right"})
rush_yards_off = soup.find_all('td', attrs={"data-stat": "rush_yds_off", "class": "right"})
rush_yards_def = soup.find_all('td', attrs={"data-stat": "rush_yds_def", "class": "right"})
team_score = soup.find_all('td', attrs={"data-stat": "pts_off", "class": "right"})
opp_score = soup.find_all('td', attrs={"data-stat": "pts_def", "class": "right"})
for i in range(len(week_num)):
if week in week_num:
print('Week Number: ' + week_num[i].text.strip(),
'Total Off: ' + total_off[i].text.strip(),
'Total Def: ' + total_def[i].text.strip(),
'Passing Yards Off: ' + pass_yards_off[i].text.strip(),
'Passing Yards Def: ' + pass_yards_def[i].text.strip(),
'Rushing Yards Off: ' + rush_yards_off[i].text.strip(),
'Rushing Yards Def: ' + rush_yards_def[i].text.strip(), '\n')
以下是我运行它时的输出:
What team are you looking for?
crd - Arizona Cardinals
atl - Atlanta Falcons
rav - Baltimore Ravens
buf - Buffalo Bills
car - Carolina Panthers
chi - Chicago Bears
cin - Cincinnati Bengals
cle - Cleveland Browns
dal - Dallas Cowboys
den - Denver Broncos
det - Detroit Lions
gnb - Green Bay Packers
htx - Houston Texans
clt - Indianapolis Colts
jax - Jacksonville Jaguars
kan - Kansas City Chiefs
sdg - Los Angeles Chargers
ram - Los Angeles Rams
mia - Miami Dolphins
min - Minnesota Vikings
nwe - New England Patriots
nor - New Orleans Saints
nyg - New York Giants
nyj - New York Jets
rai - Oakland Raiders
phi - Philadelphia Eagles
pit - Pittsburgh Steelers
sfo - San Fransisco 49ers
sea - Seattle Seahawks
tam - Tampa Bay Buccaneers
oti - Tennessee Titans
was - Washington Football Team
Enter the 3 letter code for the team: nwe
What week are you looking for? 6
必须更改for循环中的if条件
import requests
from bs4 import BeautifulSoup
team = input('''What team are you looking for?
crd - Arizona Cardinals
atl - Atlanta Falcons
rav - Baltimore Ravens
buf - Buffalo Bills
car - Carolina Panthers
chi - Chicago Bears
cin - Cincinnati Bengals
cle - Cleveland Browns
dal - Dallas Cowboys
den - Denver Broncos
det - Detroit Lions
gnb - Green Bay Packers
htx - Houston Texans
clt - Indianapolis Colts
jax - Jacksonville Jaguars
kan - Kansas City Chiefs
sdg - Los Angeles Chargers
ram - Los Angeles Rams
mia - Miami Dolphins
min - Minnesota Vikings
nwe - New England Patriots
nor - New Orleans Saints
nyg - New York Giants
nyj - New York Jets
rai - Oakland Raiders
phi - Philadelphia Eagles
pit - Pittsburgh Steelers
sfo - San Fransisco 49ers
sea - Seattle Seahawks
tam - Tampa Bay Buccaneers
oti - Tennessee Titans
was - Washington Football Team
Enter the 3 letter code for the team: ''')
week = int(input('What week are you looking for? '))
url = 'https://www.pro-football-reference.com/teams/' + team.lower() + '/2019.htm'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
week_num = soup.find_all('th', attrs={"data-stat": "week_num", "class": "right", "scope": "row"})
total_off = soup.find_all('td', attrs={"data-stat": "yards_off", "class": "right"})
total_def = soup.find_all('td', attrs={"data-stat": "yards_def", "class": "right"})
pass_yards_off = soup.find_all('td', attrs={"data-stat": "pass_yds_off", "class": "right"})
pass_yards_def = soup.find_all('td', attrs={"data-stat": "pass_yds_def", "class": "right"})
rush_yards_off = soup.find_all('td', attrs={"data-stat": "rush_yds_off", "class": "right"})
rush_yards_def = soup.find_all('td', attrs={"data-stat": "rush_yds_def", "class": "right"})
team_score = soup.find_all('td', attrs={"data-stat": "pts_off", "class": "right"})
opp_score = soup.find_all('td', attrs={"data-stat": "pts_def", "class": "right"})
try:
print('Week Number: ' + week_num[week].text.strip(),
'Total Off: ' + total_off[week].text.strip(),
'Total Def: ' + total_def[week].text.strip(),
'Passing Yards Off: ' + pass_yards_off[week].text.strip(),
'Passing Yards Def: ' + pass_yards_def[week].text.strip(),
'Rushing Yards Off: ' + rush_yards_off[week].text.strip(),
'Rushing Yards Def: ' + rush_yards_def[week].text.strip(), '\n')
except Exception as e:
print(e)
crd
和2
的输出:
Week Number: 3 Total Off: 248 Total Def: 413 Passing Yards Off: 127 Passing Yards Def: 240 Rushing Yards Off: 121 Rushing Yards Def: 173
我们实际上可以从表中动态创建团队选择。您还可以使用pandas获取表,然后按周数过滤,而不是迭代 *注意:您需要
pip安装选项
import pandas as pd
import requests
from bs4 import BeautifulSoup
import choice
url= 'https://www.pro-football-reference.com/teams/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
teams = soup.find_all('th')
# Get the links to the teams in the table
teams_dict = {}
for each in teams:
if each.find('a'):
teams_dict[each.text] = each.find('a')['href']
team_choice = choice.Menu(teams_dict.keys()).ask()
week = input('What week are you looking for? ')
url = 'https://www.pro-football-reference.com{team_url}2019.htm'.format(team_url=teams_dict[team_choice])
df = pd.read_html(url,attrs={'id':'games'})[0]
new_col_names = [col[-1] if 'Unnamed' in col[0] else '_'.join(col) for col in df.columns]
# for loop equivalent to the list comprehension above
#new_col_names = []
#for col in df.columns:
# if 'Unnamed' in col[0]:
# new_col_names.append(col[-1])
# else:
# new_col_names.append('_'.join(col))
# List comprehension equivilant to above loop
#new_col_names = [col[-1] if 'Unnamed' in col[0] else '_'.join(col) for col in df.columns]
df.columns = new_col_names
df['Week'] = df['Week'].astype(str)
week_stats = df[df['Week']==week]
cols = ['Week','Offense_TotYd','Defense_TotYd','Offense_PassY','Defense_PassY','Offense_RushY','Defense_RushY']
print (week_stats[cols].to_string())
输出:对于NE第6周
Week Offense_TotYd Defense_TotYd Offense_PassY Defense_PassY Offense_RushY Defense_RushY
5 6 427.0 213.0 313.0 161.0 114.0 52.0
令人惊叹的!这似乎奏效了。注意到它打印了前一周的内容,所以如果你输入第2周,它会给出第3周,所以我必须输入
[第1周]
,以获得正确的一周。谢谢你的快速回答。我得多读一些关于熊猫的书。这是一个梦幻足球联赛,我的一个朋友手动更新,想让它更容易。给我发一封电子邮件。杰森。schvach@gmail.com我写了一篇关于fanatsy football/fanduel的数据科学论文(现在还在写),但我也自动处理了所有这些数据和东西,所以也许可以帮你