Python-Web抓取。从表中获取内容
我试图从一张方格图表格中获取内容。我是一个新手,所以我可能会犯一些错误 网址: 看看网站中的元素,我可以看到有一些叫做“季后赛赔率表”的表格。这一切似乎都被卷到了id=“content”中 到目前为止,我的代码是:Python-Web抓取。从表中获取内容,python,web-scraping,Python,Web Scraping,我试图从一张方格图表格中获取内容。我是一个新手,所以我可能会犯一些错误 网址: 看看网站中的元素,我可以看到有一些叫做“季后赛赔率表”的表格。这一切似乎都被卷到了id=“content”中 到目前为止,我的代码是: `url = 'https://www.fangraphs.com/standings/playoff-odds' page = requests.get(url) soup = BeautifulSoup(page.content,'html.parser') soup.fin
`url = 'https://www.fangraphs.com/standings/playoff-odds'
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
soup.find("div", {"id": "content"})`
输出仅为:
<div class="playoff-odds-page" id="content"><h1>MLB Playoff Odds</h1><div id="root"></div>
MLB季后赛赔率
很明显,我在这里遗漏了一些重要的东西,我很想学习如何将表格内容拉进去
谢谢你的帮助/建议 试试下面的方法。在下面的脚本中,我使用了和JSON的方式,通过执行API调用来获取数据
import json
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
def scrap_playoff_odds():
dateEnd = '2020-07-29'
dateDelta = ''
projectionMode = 2
standingsType = 'div'
url = 'https://www.fangraphs.com/api/playoff-odds/odds?dateEnd=' + str(dateEnd) + '&dateDelta=' + str(dateDelta) + '&projectionMode=' + str(projectionMode) + '&standingsType=' + str(standingsType)
session = requests.Session()
response = session.get(url,verify=False)
result = json.loads(response.text)
for team in result:
print('-' * 100)
print(team['GB'],
team['L'],
team['W'],
team['WCGB'],
team['Wpct'],
team['division'],
team['league'],
team['shortName'],
team['endData']['ExpL'],
team['endData']['ExpW'],
team['endData']['csWin'],
team['endData']['div2Title'],
team['endData']['divTitle'],
team['endData']['dsWin'],
team['endData']['poffTitle'],
team['endData']['rosW'],
team['endData']['sos'],
team['endData']['wcTitle'],
team['endData']['wcWin'],
team['endData']['wsWin'])
print('-' * 100)
scrap_playoff_odds()
Vin的答案是正确的,但我要补充一点,我可能会使用json_normalize将其转换为一个表,以获得更好的输出,您可以进行排序、筛选等:
import json
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
from pandas.io.json import json_normalize
def scrap_playoff_odds():
dateEnd = '2020-07-29'
dateDelta = ''
projectionMode = 2
standingsType = 'div'
url = 'https://www.fangraphs.com/api/playoff-odds/odds?dateEnd=' + str(dateEnd) + '&dateDelta=' + str(dateDelta) + '&projectionMode=' + str(projectionMode) + '&standingsType=' + str(standingsType)
session = requests.Session()
response = session.get(url,verify=False)
result = json.loads(response.text)
df = json_normalize(result)
cols = ['shortName','GB','L','W','WCGB','Wpct','division','league',
'endData.ExpL','endData.ExpW','endData.csWin','endData.div2Title',
'endData.divTitle','endData.dsWin','endData.poffTitle',
'endData.rosW','endData.sos','endData.wcTitle','endData.wcWin',
'endData.wsWin']
df = df[cols]
print (df.to_string())
输出:
print (df.to_string())
shortName GB L W WCGB Wpct division league endData.ExpL endData.ExpW endData.csWin endData.div2Title endData.divTitle endData.dsWin endData.poffTitle endData.rosW endData.sos endData.wcTitle endData.wcWin endData.wsWin
0 Angels -1 4 2 -2 0.333333333333333 W AL 30.6122 29.3878 0.0457791 0.220615580677986 0.0969181 0.103138 0.495029680430889 0.507181485493978 0.505889 0.177496 0.226515 0.0217596
1 Orioles -1 2 2 -1 0.5 E AL 36.9089 22.0911 0.000339993 0.0056998860090971 0.000759985 0.00151997 0.0246794705162756 0.365292739868164 0.516855 0.0182196 0.00655987 7.99984E-05
2 Red Sox -2 4 2 -2 0.333333333333333 E AL 30.118 29.882 0.0535189 0.178836420178413 0.0705986 0.117218 0.537369027733803 0.516333332768193 0.499204 0.287934 0.252895 0.0254195
3 White Sox -2.5 4 2 -2 0.333333333333333 C AL 29.7695 30.2305 0.0471791 0.225135490298271 0.102058 0.114598 0.581128500401974 0.522787023473669 0.482352 0.253935 0.260355 0.0198996
4 Indians -0.5 2 4 0 0.666666666666667 C AL 26.3158 33.6842 0.100678 0.366512656211853 0.364553 0.215716 0.875722661614418 0.549707412719727 0.480981 0.144657 0.440991 0.051059
5 Tigers -0.5 2 4 0 0.666666666666667 C AL 33.8049 26.1951 0.00385992 0.0592988133430481 0.0161197 0.0152397 0.180256512016058 0.411020384894477 0.50463 0.104838 0.0570389 0.00125997
6 Royals -2.5 4 2 -2 0.333333333333333 C AL 34.3729 25.6271 0.00465991 0.0449991002678871 0.0114198 0.0163997 0.142657202668488 0.437538888719347 0.501574 0.0862383 0.050619 0.00139997
7 Twins 0 1 4 0 0.8 C AL 25.2144 34.7856 0.116978 0.30405393242836 0.50585 0.241535 0.92080195248127 0.559738159179688 0.478055 0.110898 0.48493 0.0602788
8 Yankees 0 1 3 0 0.75 E AL 24.8138 35.1862 0.164377 0.343113124370575 0.469091 0.294734 0.933022119104862 0.574753556932722 0.486429 0.120818 0.525629 0.0944981
9 Athletics 0 3 3 -1 0.5 W AL 27.6891 32.3109 0.0960381 0.368252635002136 0.295094 0.199316 0.788903653621674 0.542794474848994 0.491593 0.125557 0.396992 0.049459
10 Mariners -1 4 2 -2 0.333333333333333 W AL 36.3763 23.6237 0.00129997 0.0245595090091228 0.00469991 0.0048799 0.0600388199090958 0.400438873856156 0.515 0.0307794 0.0179596 0.000239995
11 Rays 0 2 4 0 0.666666666666667 E AL 25.089 34.911 0.149837 0.373012542724609 0.428471 0.274774 0.928520545363426 0.572425912927698 0.481148 0.127037 0.50753 0.0854183
12 Rangers -0.5 3 2 -1.5 0.4 W AL 32.7494 27.2506 0.0144797 0.117417648434639 0.0412392 0.0404192 0.276534844189882 0.459101832996715 0.5076 0.117878 0.107138 0.00525989
13 Blue Jays -1 3 3 -1 0.5 E AL 31.8524 28.1476 0.0154597 0.0993380099534988 0.0310794 0.0471191 0.343413416296244 0.465696299517596 0.496056 0.212996 0.130857 0.00529989
14 Diamondbacks -2.5 4 2 -2 0.333333333333333 W NL 31.7505 28.2495 0.0272995 0.128357440233231 0.0309994 0.0661387 0.336252845823765 0.486101856938115 0.515167 0.176896 0.149117 0.0104798
15 Braves -0.5 3 3 -1 0.5 E NL 27.512 32.488 0.104518 0.27667447924614 0.362753 0.210756 0.775884479284287 0.546074054859303 0.49513 0.136457 0.408552 0.047719
16 Cubs 0 2 4 0 0.666666666666667 C NL 26.5336 33.4664 0.107258 0.261894762516022 0.466751 0.224256 0.844843775033951 0.545674076786748 0.49087 0.116198 0.441051 0.047779
17 Reds -2 4 2 -2 0.333333333333333 C NL 29.6682 30.3318 0.0574988 0.233455330133438 0.163757 0.131277 0.570129320025444 0.524662971496582 0.494648 0.172917 0.275314 0.0231195
18 Rockies 0 1 4 0 0.8 W NL 30.6727 29.3273 0.0263995 0.189116224646568 0.0538989 0.0718386 0.449571132659912 0.460496347600763 0.517909 0.206556 0.182316 0.00819984
19 Marlins 0 1 2 0 0.666666666666667 E NL 34.3068 24.6932 0.00223996 0.0372792556881905 0.0139997 0.00915982 0.094378056935966 0.405235699244908 0.520411 0.0430991 0.0312594 0.000399992
20 Astros 0 3 3 -1 0.5 W AL 25.5811 34.4189 0.185516 0.269154608249664 0.562049 0.313394 0.911921977996826 0.581831472891348 0.492167 0.0807184 0.533989 0.112718
21 Dodgers -0.5 2 4 0 0.666666666666667 W NL 23.1822 36.8178 0.277714 0.205615893006325 0.708326 0.406472 0.97026077657938 0.607737011379666 0.495463 0.0563189 0.619928 0.165337
22 Brewers -1 3 3 -1 0.5 C NL 28.9605 31.0395 0.0648587 0.263134747743607 0.213376 0.143057 0.64656774699688 0.519249986719202 0.498833 0.170057 0.307534 0.0262395
23 Nationals -1.5 4 2 -2 0.333333333333333 E NL 29.0866 30.9134 0.0713786 0.250714987516403 0.218776 0.152837 0.63434799015522 0.535433345370822 0.49263 0.164857 0.314754 0.0301194
24 Mets -0.5 3 3 -1 0.5 E NL 28.0444 31.9556 0.0878982 0.281094372272491 0.303454 0.185256 0.731425389647484 0.536214828491211 0.497037 0.146877 0.373653 0.0384592
25 Phillies -1 2 1 -1.5 0.333333333333333 E NL 31.2683 28.7317 0.0289994 0.154236912727356 0.101018 0.0753785 0.39485190808773 0.486521068372225 0.508316 0.139597 0.175676 0.00931981
26 Pirates -2 4 2 -2 0.333333333333333 C NL 35.2019 24.7981 0.00307994 0.0361992754042149 0.0123598 0.0109198 0.0936181750148535 0.42218702810782 0.513259 0.0450591 0.0327593 0.000899982
27 Cardinals -1.5 3 2 -1.5 0.4 C NL 30.1584 29.8416 0.0414592 0.205315887928009 0.143757 0.101578 0.514809891581535 0.50621091669256 0.498018 0.165737 0.234415 0.0148797
28 Padres -0.5 2 4 0 0.666666666666667 W NL 27.3498 32.6502 0.0949581 0.422831535339355 0.198616 0.194596 0.791184529662132 0.530559257224754 0.502778 0.169737 0.402172 0.0419592
29 Giants -1.5 3 3 -1 0.5 W NL 34.0368 25.9632 0.00443991 0.0540789179503918 0.00815984 0.0164797 0.151876960881054 0.425244437323676 0.514944 0.0896382 0.051499 0.00103998
我认为这是因为Python脚本在生成内部动态数据之前加载。我在自己的网站上测试了这段代码,内容显示正确。我认为你是对的——最终使用了硒。我在Reddit上读到它更容易获取动态数据。