Python 从日期开始刮取数据_Python_Beautifulsoup

Python 从日期开始刮取数据

python

Python 从日期开始刮取数据,python,beautifulsoup,Python,Beautifulsoup,我只想在某个日期之后从表中提取数据。下面的代码获取数据中的第一个日期（附url），但我如何创建say for循环，以便仅从say 2020年10月11日和之前的所有行中提取数据我想创建一个for循环来提取此表中某个日期之前的所有数据（table table hover small horsePerformance）不过我觉得应该是这样的 for tr in horseresult6.find_all('tr')[1:]: daysbetween = tr.find('td',

我只想在某个日期之后从表中提取数据。下面的代码获取数据中的第一个日期（附url），但我如何创建say for循环，以便仅从say 2020年10月11日和之前的所有行中提取数据

我想创建一个for循环来提取此表中某个日期之前的所有数据（table table hover small horsePerformance）

不过我觉得应该是这样的

for tr in horseresult6.find_all('tr')[1:]: 
     daysbetween = tr.find('td', class_='date').get_text().strip()
     if xdate > daysbetween:
         do something
     else:
         continue

当我尝试此操作时，它似乎不起作用

您可以将日期与

运算符进行比较

以下是方法：

导入时间
导入请求
从bs4导入BeautifulSoup
马_url=”http://www.harness.org.au/racing/horse-search/?horseId=813476"
将requests.Session（）作为s：
尝试：
webpage\u response=s.get（horse\u url）
除requests.exceptions.ConnectionError外：
网页\u response.status\u code=“连接被拒绝”
table=BeautifulSoup(
网页_response.content，
“html.parser”，
).find（'table'，class='table table hover small horsePerformance'）
目标日期=“2020年10月11日”
对于表中的行。find_all（“tr”）[1:：#跳过标题
日期=行.find（“td”，class=“date”）.find（“a”）.getText（）#表日期
如果time.strtime（日期，“%d%b%Y”）>=time.strtime（目标日期，“%d%b%Y”）：#比较日期
#在这里进行解析，这只是一个示例
打印（f'{date}-{row.find（“td”，class=“stake”）.getText（strip=True）}）

输出：

05 Apr 2021 - $4,484
29 Mar 2021 - $595
23 Mar 2021 - $4,484
12 Mar 2021 - $220
08 Mar 2021 - $181
02 Mar 2021 - $263
19 Feb 2021 - $180
12 Feb 2021 - $1,200
26 Jan 2021 - $4,484

26 Jan 2021 - $4,484
14 Sep 2020 - $100
11 Sep 2020 - $616
04 Sep 2020 - $180
21 Aug 2020 - $180
17 Aug 2020 - $595
28 Jul 2020 - $4,291
21 Jul 2020 - $3,523
13 Jul 2020 - $300
30 Jun 2020 - $1,173
15 Jun 2020 - $100
30 May 2020 - $3,523
22 May 2020 - $500
12 May 2020 - $963
05 May 2020 - $3,523
02 May 2020 - $1,986
24 Apr 2020 - $144
09 Apr 2020 - $144
30 Mar 2020 - $1,225
10 Mar 2020 - $100
09 Dec 2019 - $595
02 Dec 2019 - $4,484
26 Nov 2019 - $4,484
19 Nov 2019 - $100
02 Nov 2019 - $4,484
27 Oct 2019 - $2,562
13 Oct 2019 - $700
31 May 2019 - $1,000
21 May 2019 - $4,484
07 May 2019 - $1,225
27 Apr 2019 - $595
21 Apr 2019 - $0
14 Apr 2019 - $0
07 Apr 2019 - $0

时光倒流：

target_date=“2021年1月26日”
对于表中的行。find_all（“tr”）[1:：#跳过标题
日期=行.find（“td”，class=“date”）.find（“a”）.getText（）#表日期
如果time.strtime（date，“%d%b%Y”）soup.find
获取与参数匹配的第一个标记。使用soup.findAll
它将为您提供标记对象的列表。然后使用for循环在该列表上迭代并检查这些标记中的日期。如果您希望从1月26日返回，而不是在timeWell中向前，那么您可以更改目标日期
，并将比较从=
切换到，如果您希望在“返回时间”中调用输出中的第二行怎么办？是否需要在for循环中使用enumerate来调用代码中的第二个值？是的，这是一种方法。
26 Jan 2021 - $4,484
14 Sep 2020 - $100
11 Sep 2020 - $616
04 Sep 2020 - $180
21 Aug 2020 - $180
17 Aug 2020 - $595
28 Jul 2020 - $4,291
21 Jul 2020 - $3,523
13 Jul 2020 - $300
30 Jun 2020 - $1,173
15 Jun 2020 - $100
30 May 2020 - $3,523
22 May 2020 - $500
12 May 2020 - $963
05 May 2020 - $3,523
02 May 2020 - $1,986
24 Apr 2020 - $144
09 Apr 2020 - $144
30 Mar 2020 - $1,225
10 Mar 2020 - $100
09 Dec 2019 - $595
02 Dec 2019 - $4,484
26 Nov 2019 - $4,484
19 Nov 2019 - $100
02 Nov 2019 - $4,484
27 Oct 2019 - $2,562
13 Oct 2019 - $700
31 May 2019 - $1,000
21 May 2019 - $4,484
07 May 2019 - $1,225
27 Apr 2019 - $595
21 Apr 2019 - $0
14 Apr 2019 - $0
07 Apr 2019 - $0