如何在python中从HTML中提取时间类?
我有一段通过beautifulsoup使用python编写的HTML代码,但无法从中检索到所需的时间标记如何在python中从HTML中提取时间类?,python,html,beautifulsoup,tags,Python,Html,Beautifulsoup,Tags,我有一段通过beautifulsoup使用python编写的HTML代码,但无法从中检索到所需的时间标记 HTML is called K: <time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950"> <a class="action pul
HTML is called K:
<time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
<ul class="breadcrumb inline">
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
</ul>
</time>
HTML称为K:
-
我可以提取除时间以外的所有标记:
K.a :
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
K.li:
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
K.time:
Nothing prints
K.a:
李克强:
K.时间:
没有指纹
我还尝试了以下解决方案:
K.find('time', {'class':'dtstart'})
Nothing prints
K.find('a', {'class':'action pull-right print-cat'})
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
K.find('time',{'class':'dtstart'})
没有指纹
K.find('a',{'class':'action pull right print cat'))
当我们检查K时,我们看到以下情况:
Signature: K(*args, **kwargs)
Type: Tag
String form:
<time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
<ul class="breadcrumb inline">
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
</ul>
</time>
Length: 5
File: ~/.local/lib/python3.6/site-packages/bs4/element.py
Source:
签名:K(*args,**kwargs)
类型:标签
字符串形式:
-
长度:5
文件:~/.local/lib/python3.6/site-packages/bs4/element.py
资料来源:
怎么可能没有提取时间标记?您需要仔细检查脚本中接收到的html代码。在您的问题中使用html的一个简单示例,很明显bs4可以获得一个时间标签
从bs4导入美化组
html_string=“”
-
"""
k=BeautifulSoup(html\u字符串,features=“lxml”)
打印(k.time.attrs)
输出
{'class':['dtstart'],'datetime':'201710年12月5日上午GMT:30','id':'x-event-date','xcdate':'1512469800950'}
我仍然不确定为什么我一开始无法收到它,但克里斯·道尔为成功铺平了道路。我们可以简单地对其进行资源配置并获得所需的结果:
Date=soup(str(K), "html.parser").time.attrs["datetime"]
print(Date)
#Output
{'class': ['dtstart'], 'datetime': '05 December 201710:30 AM GMT', 'id': 'x-event-date', 'xcdate': '1512469800950'}
源url和预期的输出值是多少?您是否确认它不是动态加载的?似乎网页正在为此制作额外的xhr,或者正在使用js从其他地方计算/填充