Python 使用两个属性值刮取动态网页
我曾经运行以下代码来抓取网页。突然,代码停止工作。我查看了网页,发现表行又添加了一个属性。我想不出怎么修理它。有人能帮我吗Python 使用两个属性值刮取动态网页,python,beautifulsoup,Python,Beautifulsoup,我曾经运行以下代码来抓取网页。突然,代码停止工作。我查看了网页,发现表行又添加了一个属性。我想不出怎么修理它。有人能帮我吗 import pandas as pd import requests cookies = { 'BotMitigationCookie_9518109003995423458': '343775001600940465b2KTzJpwY5pXpiVNIRRi97Z3ELk=' } def main(url): r = requests.post(ur
import pandas as pd
import requests
cookies = {
'BotMitigationCookie_9518109003995423458': '343775001600940465b2KTzJpwY5pXpiVNIRRi97Z3ELk='
}
def main(url):
r = requests.post(url, cookies=cookies)
df = pd.read_html(r.content, header=0, attrs={'class': 'table_bd f_tal'})
new = pd.concat(df, ignore_index=True)
print(new)
new.to_csv("tw1012.csv", index=False)
main("https://racing.hkjc.com/racing/information/english/Trackwork/TrackworkOneDayResult.aspx?OneDay=12/10/2020")
您在网页上看到的数据通过Ajax从不同的URL动态加载:
import json
import requests
import pandas as pd
url = 'https://racing.hkjc.com/racing/information/json/TrackworkOneDayRecords/202010121E.aspx?PageNum=1'
# sometimes the server returns error message
# so repeat until success
while True:
try:
data = requests.get(url).json()
break
except json.decoder.JSONDecodeError:
continue
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
df = pd.DataFrame(data['Records'])
print(df)
印刷品:
Horse Trainer ... Gear WorkoutFilters
0 A A AGILITY L Ho ... NaN
1 A SHIN CHALLENGE W Y So ... B NaN
2 A SHIN CHALLENGE W Y So ... B NaN
3 A SMILE LIKE YOURS J Size ... NaN
4 ABOVE A T Millard ... H NaN
5 ABOVE ALL A T Millard ... H NaN
6 ABOVE ALL A T Millard ... H NaN
7 ADONIS D J Whyte ... H NaN
8 AEROSONIC J Size ... Date=12/10/2020#b4
9 AEROSONIC J Size ... NaN
10 AFRICAN SKY T P Yung ... NaN
11 AFTER ME D E Ferraris ... NaN
12 AION D A Hayes ... H NaN
13 ALCARI P F Yiu ... NaN
14 ALEE KING PRAWN D A Hayes ... H NaN
15 ALL BEST FRIENDS K L Man ... NaN
16 ALL FOR ST PAUL'S F C Lor ... H NaN
17 ALL FOR ST PAUL'S F C Lor ... H NaN
18 ALL IN MIND A S Cruz ... H/XB Date=12/10/2020#b6
19 ALL JOYFUL T P Yung ... H NaN
20 ALL THE WAY A T Millard ... H NaN
21 ALL TIMES GRATEFUL C H Yip ... NaN
22 ALL TOO WIN D J Hall ... NaN
23 ALL TOO WIN D J Hall ... NaN
24 ALL YOU KNOW R Gibson ... H NaN
25 ALL YOU WANT R Gibson ... H NaN
26 ALLOY STAR T P Yung ... NaN
27 ALPHA HEDGE K W Lui ... H NaN
28 ALWAYS BEAUTY D A Hayes ... Date=12/10/2020#b4
29 ALWAYS BEAUTY D A Hayes ... NaN
30 AMARA WIN P F Yiu ... Date=12/10/2020#b3
31 AMARA WIN P F Yiu ... NaN
32 AMAZEMENT J Size ... NaN
33 AMAZING AGILITY L Ho ... NaN
34 AMAZING BEATS P O'Sullivan ... NaN
35 AMAZING BOY C W Chang ... H NaN
36 AMAZING CHOCOLATE A T Millard ... H NaN
37 AMAZING CHOCOLATE A T Millard ... H NaN
38 AMAZING KIWI D J Whyte ... H/XB NaN
39 AMAZING KNIGHT J Size ... H NaN
40 AMAZING LUCK P O'Sullivan ... NaN
41 AMAZING NEWS C Fownes ... H NaN
42 AMAZING ONE PLUS C S Shum ... NaN
43 AMAZING ONE PLUS C S Shum ... NaN
44 AMAZING ROCKY T P Yung ... H NaN
45 AMAZING ROCKY T P Yung ... H NaN
46 AMAZING STAR K H Ting ... H NaN
47 AMBITIOUS HEART D E Ferraris ... H NaN
48 ANGEL OF MY EYES C S Shum ... H NaN
49 APACHE PASS P O'Sullivan ... NaN
[50 rows x 7 columns]