Python 在表格中拖动链接，单击链接&；刮取数据_Python_Selenium_Beautifulsoup_Python Requests

Python 在表格中拖动链接，单击链接&；刮取数据

python selenium

Python 在表格中拖动链接，单击链接&；刮取数据,python,selenium,beautifulsoup,python-requests,Python,Selenium,Beautifulsoup,Python Requests,我有一个表，我希望拿起所有的链接，通过链接和刮在td类=马的项目包含所有链接的表所在的主页具有以下代码： <table border="0" cellspacing="0" cellpadding="0" class="full-calendar"> <tr> <th width="160"> </th> <th width="105"><a href="/FreeF

我有一个表，我希望拿起所有的链接，通过链接和刮在td类=马的项目

包含所有链接的表所在的主页具有以下代码：

  <table border="0" cellspacing="0" cellpadding="0" class="full-calendar">
    <tr>
        <th width="160">&nbsp;</th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=NSW">NSW</a></th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=VIC">VIC</a></th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=QLD">QLD</a></th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=WA">WA</a></th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=SA">SA</a></th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=TAS">TAS</a></th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=ACT">ACT</a></th>
        <th width="105"><a href="/FreeFields/Calendar.aspx?State=NT">NT</a></th>
    </tr>  


    <tr class="rows">
        <td>
            <p><span>FRIDAY 13 JAN</span></p>
        </td>

                <td>
                    <p>

                            <a href="/FreeFields/Form.aspx?Key=2017Jan13,NSW,Ballina">Ballina</a><br>

                            <a href="/FreeFields/Form.aspx?Key=2017Jan13,NSW,Gosford">Gosford</a><br>

                    </p>
                </td>

                <td>
                    <p>

                            <a href="/FreeFields/Form.aspx?Key=2017Jan13,VIC,Ararat">Ararat</a><br>

                            <a href="/FreeFields/Form.aspx?Key=2017Jan13,VIC,Cranbourne">Cranbourne</a><br>

                    </p>
                </td>

                <td>
                    <p>

                            <a href="/FreeFields/Form.aspx?Key=2017Jan13,QLD,Doomben">Doomben</a><br>

                    </p>
                </td>

想知道是否有人可以帮助我如何让代码单击表中的所有链接&对每个页面执行以下操作

g data = soup.findall("td",{"class":"horse"})
for item in g_data:
   print item.text

提前谢谢

import requests, bs4, re
from urllib.parse import urljoin
start_url = 'http://www.racingaustralia.horse/'

def make_soup(url):
    r = requests.get(url)
    soup = bs4.BeautifulSoup(r.text, 'lxml')
    return soup

def get_links(url):
    soup = make_soup(url)
    a_tags = soup.find_all('a', href=re.compile(r"^/FreeFields/"))
    links = [urljoin(start_url, a['href'])for a in a_tags]  # convert relative url to absolute url
    return links

def get_tds(link):
    soup = make_soup(link)
    tds = soup.find_all('td',  class_="horse")
    if not tds:
        print(link, 'do not find hours tag')
    else:
        for td in tds:
            print(td.text)

if __name__ == '__main__':
    links = get_links(start_url)
    for link in links:
        get_tds(link)

输出：

bs4+请求可以满足您的需要

输出：

bs4+请求可以满足您的需要。

单击链接是什么意思？意思是说，转到链接页面，然后将所有链接都删除？是的，因此该表包含如下数据，1月13日星期五

@KirstyDent请将任何相关数据，如上面评论中的HTML，为了让以后的读者更容易找到问题本身。抱歉-我现在就做！“点击链接”是什么意思？意思是说，转到链接页面，然后将所有链接都删除？是的，因此该表包含如下数据，1月13日星期五

@KirstyDent请将任何相关数据，如上面评论中的HTML，为了让以后的读者更容易找到问题本身。抱歉-我现在就做！如何将分页添加到此代码中？主页上有多个页面的地方显示您是否将分页添加到此代码中？其中主页有多个页面

import requests, bs4, re
from urllib.parse import urljoin
start_url = 'http://www.racingaustralia.horse/'

def make_soup(url):
    r = requests.get(url)
    soup = bs4.BeautifulSoup(r.text, 'lxml')
    return soup

def get_links(url):
    soup = make_soup(url)
    a_tags = soup.find_all('a', href=re.compile(r"^/FreeFields/"))
    links = [urljoin(start_url, a['href'])for a in a_tags]  # convert relative url to absolute url
    return links

def get_tds(link):
    soup = make_soup(link)
    tds = soup.find_all('td',  class_="horse")
    if not tds:
        print(link, 'do not find hours tag')
    else:
        for td in tds:
            print(td.text)

if __name__ == '__main__':
    links = get_links(start_url)
    for link in links:
        get_tds(link)

http://www.racingaustralia.horse/FreeFields/GroupAndListedRaces.aspx do not find hours tag
http://www.racingaustralia.horse/FreeFields/Calendar.aspx?State=NSW do not find hours tag
http://www.racingaustralia.horse/FreeFields/Calendar.aspx?State=VIC do not find hours tag
http://www.racingaustralia.horse/FreeFields/Calendar.aspx?State=QLD do not find hours tag
http://www.racingaustralia.horse/FreeFields/Calendar.aspx?State=WA do not find hours tag
.......

WEARETHECHAMPIONS 
STORMY HORIZON 
OUR RED JET 
SAPPER TOM 
MY COUSIN BOB 
ALL TOO HOT 
SAGA DEL MAR 
ZIGZOFF 
SASHAY AWAY 
SO SHE IS 
MILADY DUCHESS