Python中的网页抓取如何解析html?

Python中的网页抓取如何解析html?,python,html,web-scraping,Python,Html,Web Scraping,我尝试在ajax页面中抓取站点。我只是在学python。对不起,如果这是一个简单的问题 使用selenium加载页面并下载html格式的代码。 它们可以完美地工作。 但我有一个问题,如何解析这些数据 我希望数据如下所示(可能是将数据写入变量,因为我想将其传输到mysql数据库): html代码中的数据位置:: 名称 率/赢zł 您可以使用beautifulsoup来解析和遍历HTML。你已经导入了它,所以你已经完成了一半。 Custom ID: Name: Ticket NO: Rate

我尝试在ajax页面中抓取站点。我只是在学python。对不起,如果这是一个简单的问题

使用selenium加载页面并下载html格式的代码。 它们可以完美地工作。 但我有一个问题,如何解析这些数据

我希望数据如下所示(可能是将数据写入变量,因为我想将其传输到mysql数据库):

html代码中的数据位置::

  • 名称 率/赢zł


  • 您可以使用beautifulsoup来解析和遍历HTML。你已经导入了它,所以你已经完成了一半。
    Custom ID:
    Name:
    Ticket NO:
    Rate:
    Win:
    
    from time import sleep
    from selenium import webdriver
    from bs4 import BeautifulSoup
    
    options = webdriver.ChromeOptions()
    options.add_argument('headless')
    
    browser = webdriver.Chrome(
            ("C:/Users/backu/Downloads/chromedriver_win32/chromedriver.exe"),
            chrome_options=options)
    
    browser.get("https://www.sts.pl/pl/oferta/zaklady-live/")
    sleep(1)
    source = browser.page_source # Get the entire page source from the browser
    if browser is not None :browser.close() # No need for the browser so close it 
    soup = BeautifulSoup(source,'html.parser')
    try:
        Tags = soup.select('ul.nc-message-holder li.message') # get the elements using css selectors    
        for tag in Tags: # loop through them 
            customerId = tag.find('div').get('customid')
            name       = tag.find('div').find('span').text
            #<span class="nc-ticket" onclick="serchTicketHandler('223461999015343335')">8.00 / 51.04 zł</span>
            ticketTag  = tag.select('span.nc-ticket')
            if ticketTag : 
                ticketNum = ticketTag[0].get('onclick').replace("serchTicketHandler('","").replace("')","")
                rate_Win  = ticketTag[0].text
                if '/' in rate_Win:
                    rate_Win = rate_Win.split('/')
                    rate      = rate_Win[0].strip()
                    win       = rate_Win[1].strip()
                else:
                    rate = rate_Win
                    win  = ''
    
                print('\n\ncustomerId ==>',customerId)
                print('name ==>',name)
                print('ticketNum ==>',ticketNum)
                print('rate ==>',rate)
                print('win ==>',win)
    except Exception as e:
        print(e)
    
    customerId ==> c46654fa66765ae11bb34d7d99cf0a77
    name ==> Wojciech W
    ticketNum ==> 223461999016744267
    rate ==> 100.00
    win ==> 1340.24 zł
    
    
    customerId ==> 7b071de240b730ad42cee50711dd8c72
    name ==> Grzegorz P
    ticketNum ==> 223461988025841282
    rate ==> 15.94
    win ==> 46.28 zł
    
    
    customerId ==> 244950ab8485b7180c177a2b7b19b0ae
    name ==> Michał J
    ticketNum ==> 313441988030838257
    rate ==> 12.00
    win ==> 73967.98 zł
    
    
    customerId ==> 9223e1c2f87afb02e6c704acb53308da
    name ==> Piotr G
    ticketNum ==> 313431999017162038
    rate ==> 2.00
    win ==> 430.40 zł
    
    
    customerId ==> 4a8e2695fe71a084f69167ac987c7013
    name ==> Dawid B
    ticketNum ==> 313461988013246357
    rate ==> 10.00
    win ==> 1569.30 zł
    
    
    customerId ==> 6b882a5ef93e0c3e52b81bbee0ba52af
    name ==> Adrian P
    ticketNum ==> 313441988034262951
    rate ==> 2.00
    win ==> 451268.63 zł
    
    
    customerId ==> abd34ea0c7a9b0e07a53a78324cb7e0a
    name ==> Michał D
    ticketNum ==> 223461999013746135
    rate ==> 10.00
    win ==> 27.72 zł
    
    
    customerId ==> bed4fc0ea1f21a7a9b1c6762d2302d09
    name ==> Rafał Ż
    ticketNum ==> 223461988021146803
    rate ==> 607.40
    win ==> 2150.26 zł