如何在python中从HTML中提取时间类?

如何在python中从HTML中提取时间类?,python,html,beautifulsoup,tags,Python,Html,Beautifulsoup,Tags,我有一段通过beautifulsoup使用python编写的HTML代码,但无法从中检索到所需的时间标记 HTML is called K: <time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950"> <a class="action pul

我有一段通过beautifulsoup使用python编写的HTML代码,但无法从中检索到所需的时间标记

HTML is called K:

<time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
<ul class="breadcrumb inline">
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>
</ul>
</time>    
HTML称为K:
我可以提取除时间以外的所有标记:

K.a :
<a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>

K.li:
<li>
<a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
</li>

K.time:
Nothing prints
K.a:
李克强:
  • K.时间: 没有指纹
    我还尝试了以下解决方案:

    K.find('time', {'class':'dtstart'})
    Nothing prints
    
    K.find('a', {'class':'action pull-right print-cat'})
    <a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
    
    K.find('time',{'class':'dtstart'})
    没有指纹
    K.find('a',{'class':'action pull right print cat'))
    
    当我们检查K时,我们看到以下情况:

    Signature:      K(*args, **kwargs)
    Type:           Tag
    String form:   
    <time class="dtstart" datetime="05 December 201710:30 AM GMT" id="x-event-date" xcdate="1512469800950">
    <a class="action pull-right print-cat" data-href="/en/aus/2017/some-url-data-l17407.html" data-modalid="catalogueModal" data-toggle="modal" href="/en/auctions/ecatalogue/lot.print.L17407.html" style="display: none;">Print My Catalogue (0)</a>
    <ul class="breadcrumb inline">
    <li>
    <a href="/en/aus/2017/some-url-data-l17407.html"><span class="active">Smartphone and watches</span></a>
    </li>
    </ul>
    </time>  
    Length:         5
    File:           ~/.local/lib/python3.6/site-packages/bs4/element.py
    Source:    
    
    签名:K(*args,**kwargs)
    类型:标签
    字符串形式:
    
    长度:5 文件:~/.local/lib/python3.6/site-packages/bs4/element.py 资料来源:

    怎么可能没有提取时间标记?

    您需要仔细检查脚本中接收到的html代码。在您的问题中使用html的一个简单示例,很明显bs4可以获得一个时间标签

    从bs4导入美化组
    html_string=“”
    
    """ k=BeautifulSoup(html\u字符串,features=“lxml”) 打印(k.time.attrs)
    输出

    {'class':['dtstart'],'datetime':'201710年12月5日上午GMT:30','id':'x-event-date','xcdate':'1512469800950'}
    
    我仍然不确定为什么我一开始无法收到它,但克里斯·道尔为成功铺平了道路。我们可以简单地对其进行资源配置并获得所需的结果:

    Date=soup(str(K), "html.parser").time.attrs["datetime"]
    print(Date)
    
    #Output
    {'class': ['dtstart'], 'datetime': '05 December 201710:30 AM GMT', 'id': 'x-event-date', 'xcdate': '1512469800950'}
    

    源url和预期的输出值是多少?您是否确认它不是动态加载的?似乎网页正在为此制作额外的xhr,或者正在使用js从其他地方计算/填充