Python 刮表和打印各种数据
我目前有以下几行代码Python 刮表和打印各种数据,python,beautifulsoup,python-requests,Python,Beautifulsoup,Python Requests,我目前有以下几行代码 import requests, re, bs4 from urllib.parse import urljoin start_url = 'http://www.racingaustralia.horse/' def make_soup(url): r = requests.get(url) soup = bs4.BeautifulSoup(r.text,"lxml") return soup def get_links(url): s
import requests, re, bs4
from urllib.parse import urljoin
start_url = 'http://www.racingaustralia.horse/'
def make_soup(url):
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text,"lxml")
return soup
def get_links(url):
soup = make_soup(url)
a_tags = soup.find_all('a', href=re.compile(r"^/FreeFields/"))
links = [urljoin(start_url,a['href']) for a in a_tags]
return links
def get_tds(link):
soup = make_soup(link)
tds = soup.find_all('td', class_="horse")
for td in tds:
print(td.text)
if __name__ == '__main__':
links = get_links(start_url)
for link in links:
get_tds(link)
这将从racingaustralia.com/horse中删除表内会议的所有马名
这正是我想要的,但我也希望检索会议的日期,会议的地点,并为每一场比赛,列出马的名字
这是我想要的一个例子:
Date of Race Meet
Location of Race Meet
Race Number
Horse....
...
...
...
Race Number
Horse
...
...
etc
有人能帮我调整代码,以便打印每次比赛的日期和地点以及每匹马的比赛号码吗
我尝试了以下方法,但我想知道是否有更有效的方法
def get_tds(link):
soup = make_soup(link)
race_date = soup.find_all('span', class_="race-venue-date")
for span in race_date:
print(span.text)
tds = soup.find_all('td', class_="horse")
for td in tds:
print(td.text)
def get_info(link):
item = soup.find_all('div', class_="top")
for div in item:
print(div.text)
if __name__ == '__main__':
links = get_links(start_url)
for link in links:
get_info(link), get_tds(link)
提前谢谢
import requests, re, bs4
from urllib.parse import urljoin
def make_soup(url):
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text,"lxml")
return soup
def get_links(url):
soup = make_soup(url)
a_tags = soup.find_all('a', href=re.compile(r"^/FreeFields/"))
links = [urljoin(start_url,a['href']) for a in a_tags]
return links
def get_info(link):
soup = make_soup(link)
tds = soup.find_all('td', class_="horse")
if tds:
top = soup.find(class_="top").h2
for s in top.stripped_strings:
print(s)
for index, td in enumerate(tds, 1):
print(index, td.text, sep='\n')
else:
print('not find')
if __name__ == '__main__':
start_url = 'http://www.racingaustralia.horse/'
links = get_links(start_url)
for link in links:
get_info(link)
输出:
有很多URL不包含您需要的信息,您应该更改正则表达式以将其过滤掉,这样,您的代码可以运行得更快。我写上面的代码是为了让您了解它是如何工作的,您不应该要求其他人为您编写代码。您好,您可能已经注意到我实际上已经修改了您为我编写的代码。在您更改我的代码之前,我还编写了一些代码。我只是请求帮助,如果这是要求太多,我会删除它,这个结果打印的是马匹号码,而不是比赛号码。我不是真的想要马的号码——只是比赛号码。
Warwick Farm: Australian Turf Club
Wednesday, 18 January 2017
1
GAUGUIN (NZ)
2
DAHOOIL (NZ)
3
METAMORPHIC
4
MY KIND
5
CONCISELY
6
ARAZONA
7
APOLLO
8
IGNITE THE LIGHT
9
KRUPSKAYA