如何在python中从HTML中提取时间类？_Python_Html_Beautifulsoup_Tags

如何在python中从HTML中提取时间类？

python html tags

如何在python中从HTML中提取时间类？,python,html,beautifulsoup,tags,Python,Html,Beautifulsoup,Tags,我有一段通过beautifulsoup使用python编写的HTML代码，但无法从中检索到所需的时间标记 HTML is called K: <time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950"> <a class="action pul

我有一段通过beautifulsoup使用python编写的HTML代码，但无法从中检索到所需的时间标记

HTML is called K:

<time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
<ul class="breadcrumb inline">
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
</ul>
</time>

HTML称为K:

我可以提取除时间以外的所有标记：

K.a :
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>

K.li:
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>

K.time:
Nothing prints

K.a:
李克强：


K.时间：
没有指纹

我还尝试了以下解决方案：

K.find('time', {'class':'dtstart'})
Nothing prints

K.find('a', {'class':'action pull-right print-cat'})
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>

K.find（'time'，{'class'：'dtstart'}）
没有指纹
K.find（'a'，{'class'：'action pull right print cat'））

当我们检查K时，我们看到以下情况：

Signature:      K(*args, **kwargs)
Type:           Tag
String form:   
<time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
<ul class="breadcrumb inline">
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
</ul>
</time>  
Length:         5
File:           ~/.local/lib/python3.6/site-packages/bs4/element.py
Source:

签名：K（*args，**kwargs）
类型：标签
字符串形式：




长度：5
文件：~/.local/lib/python3.6/site-packages/bs4/element.py
资料来源：

怎么可能没有提取时间标记？

您需要仔细检查脚本中接收到的html代码。在您的问题中使用html的一个简单示例，很明显bs4可以获得一个时间标签

从bs4导入美化组
html_string=“”




"""
k=BeautifulSoup（html\u字符串，features=“lxml”）
打印（k.time.attrs）

输出

{'class'：['dtstart']，'datetime'：'201710年12月5日上午GMT:30'，'id'：'x-event-date'，'xcdate'：'1512469800950'}

我仍然不确定为什么我一开始无法收到它，但克里斯·道尔为成功铺平了道路。我们可以简单地对其进行资源配置并获得所需的结果：

Date=soup(str(K), "html.parser").time.attrs["datetime"]
print(Date)

#Output
{'class': ['dtstart'], 'datetime': '05 December 201710:30 AM GMT', 'id': 'x-event-date', 'xcdate': '1512469800950'}

源url和预期的输出值是多少？您是否确认它不是动态加载的？似乎网页正在为此制作额外的xhr，或者正在使用js从其他地方计算/填充