Python中的网页抓取如何解析html?
我尝试在ajax页面中抓取站点。我只是在学python。对不起,如果这是一个简单的问题 使用selenium加载页面并下载html格式的代码。 它们可以完美地工作。 但我有一个问题,如何解析这些数据 我希望数据如下所示(可能是将数据写入变量,因为我想将其传输到mysql数据库): html代码中的数据位置::Python中的网页抓取如何解析html?,python,html,web-scraping,Python,Html,Web Scraping,我尝试在ajax页面中抓取站点。我只是在学python。对不起,如果这是一个简单的问题 使用selenium加载页面并下载html格式的代码。 它们可以完美地工作。 但我有一个问题,如何解析这些数据 我希望数据如下所示(可能是将数据写入变量,因为我想将其传输到mysql数据库): html代码中的数据位置:: 名称 率/赢zł 您可以使用beautifulsoup来解析和遍历HTML。你已经导入了它,所以你已经完成了一半。 Custom ID: Name: Ticket NO: Rate
您可以使用beautifulsoup来解析和遍历HTML。你已经导入了它,所以你已经完成了一半。
Custom ID:
Name:
Ticket NO:
Rate:
Win:
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(
("C:/Users/backu/Downloads/chromedriver_win32/chromedriver.exe"),
chrome_options=options)
browser.get("https://www.sts.pl/pl/oferta/zaklady-live/")
sleep(1)
source = browser.page_source # Get the entire page source from the browser
if browser is not None :browser.close() # No need for the browser so close it
soup = BeautifulSoup(source,'html.parser')
try:
Tags = soup.select('ul.nc-message-holder li.message') # get the elements using css selectors
for tag in Tags: # loop through them
customerId = tag.find('div').get('customid')
name = tag.find('div').find('span').text
#<span class="nc-ticket" onclick="serchTicketHandler('223461999015343335')">8.00 / 51.04 zł</span>
ticketTag = tag.select('span.nc-ticket')
if ticketTag :
ticketNum = ticketTag[0].get('onclick').replace("serchTicketHandler('","").replace("')","")
rate_Win = ticketTag[0].text
if '/' in rate_Win:
rate_Win = rate_Win.split('/')
rate = rate_Win[0].strip()
win = rate_Win[1].strip()
else:
rate = rate_Win
win = ''
print('\n\ncustomerId ==>',customerId)
print('name ==>',name)
print('ticketNum ==>',ticketNum)
print('rate ==>',rate)
print('win ==>',win)
except Exception as e:
print(e)
customerId ==> c46654fa66765ae11bb34d7d99cf0a77
name ==> Wojciech W
ticketNum ==> 223461999016744267
rate ==> 100.00
win ==> 1340.24 zł
customerId ==> 7b071de240b730ad42cee50711dd8c72
name ==> Grzegorz P
ticketNum ==> 223461988025841282
rate ==> 15.94
win ==> 46.28 zł
customerId ==> 244950ab8485b7180c177a2b7b19b0ae
name ==> Michał J
ticketNum ==> 313441988030838257
rate ==> 12.00
win ==> 73967.98 zł
customerId ==> 9223e1c2f87afb02e6c704acb53308da
name ==> Piotr G
ticketNum ==> 313431999017162038
rate ==> 2.00
win ==> 430.40 zł
customerId ==> 4a8e2695fe71a084f69167ac987c7013
name ==> Dawid B
ticketNum ==> 313461988013246357
rate ==> 10.00
win ==> 1569.30 zł
customerId ==> 6b882a5ef93e0c3e52b81bbee0ba52af
name ==> Adrian P
ticketNum ==> 313441988034262951
rate ==> 2.00
win ==> 451268.63 zł
customerId ==> abd34ea0c7a9b0e07a53a78324cb7e0a
name ==> Michał D
ticketNum ==> 223461999013746135
rate ==> 10.00
win ==> 27.72 zł
customerId ==> bed4fc0ea1f21a7a9b1c6762d2302d09
name ==> Rafał Ż
ticketNum ==> 223461988021146803
rate ==> 607.40
win ==> 2150.26 zł