Python 做getallAttributes()的正确方法是什么

Python 做getallAttributes()的正确方法是什么,python,xpath,web-scraping,scrapy,Python,Xpath,Web Scraping,Scrapy,我试图读取给定元素的属性(属性)。 我想提取所有属性名称-值对的字典 我目前正在做的是使用regex并列出所有属性值。但这里的问题是,它只显示属性的值,而不显示名称: attributes = node.xpath("@*") print attributes print len(attributes) for att in attributes: print att 示例输出如下所示: <Selector xpath='@*' data=u'1'> <Selecto

我试图读取给定元素的属性(属性)。 我想提取所有属性名称-值对的字典

我目前正在做的是使用regex并列出所有属性值。但这里的问题是,它只显示属性的值,而不显示名称:

attributes = node.xpath("@*")
print attributes
print len(attributes)
for att in attributes:
    print att
示例输出如下所示:

<Selector xpath='@*' data=u'1'>
<Selector xpath='@*' data=u'2761554'>
<Selector xpath='@*' data=u'1431756540503'>

有人能推荐一种列出元素所有属性的方法吗?

我将其用于python/scrapy。

对于XPath,您可以使用属性作为参数的
name()

  • 计数元素的属性
    count(@*)
  • 对于每个属性位置,使用
    @*[position]
  • meta
    元素的scrapy shell会话示例:

    $ scrapy shell "https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes"
    2015-05-18 10:47:28+0200 [default] DEBUG: Crawled (200) <GET https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes> (referer: None)
    [s] Available Scrapy objects:
    [s]   crawler    <scrapy.crawler.Crawler object at 0x7f732bf4b190>
    [s]   item       {}
    [s]   request    <GET https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes>
    [s]   response   <200 https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes>
    [s]   settings   <scrapy.settings.Settings object at 0x7f732bf3ffd0>
    [s]   spider     <DefaultSpider 'default' at 0x7f73268eead0>
    [s] Useful shortcuts:
    [s]   shelp()           Shell help (print this help)
    [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
    [s]   view(response)    View response in a browser
    
    In [1]: import pprint
    In [2]: for meta  in response.xpath('//meta[@*]'):
       ...:     nbattr = int(float(meta.xpath('count(@*)').extract()[0]))
       ...:     pprint.pprint(dict((meta.xpath('name(@*[%d])' % i).extract()[0], meta.xpath('@*[%d]' % i).extract()[0]) for i in range(1, nbattr+1)))
       ...:     print
       ...:     
    {u'content': u'summary', u'name': u'twitter:card'}
    
    {u'content': u'stackoverflow.com', u'name': u'twitter:domain'}
    
    {u'content': u'website', u'property': u'og:type'}
    
    {u'content': u'https://cdn.sstatic.net/stackoverflow/img/apple-touch-icon@2.png?v=ea71a5211a91&a',
     u'itemprop': u'image primaryImageOfPage',
     u'property': u'og:image'}
    
    {u'content': u'what would be the right way of doing getallAttributes()',
     u'itemprop': u'title name',
     u'name': u'twitter:title',
     u'property': u'og:title'}
    
    {u'content': u'I am trying to read the property(attributes) of given element .\n\nI want to extract the Dictionary of all the attributes name-value pair..\n\nwhat i am currently doing is i am using regex and listting...',
     u'itemprop': u'description',
     u'name': u'twitter:description',
     u'property': u'og:description'}
    
    {u'content': u'http://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
     u'property': u'og:url'}
    
    {u'content': u'US', u'name': u'twitter:app:country'}
    
    {u'content': u'Stack Exchange iOS', u'name': u'twitter:app:name:iphone'}
    
    {u'content': u'871299723', u'name': u'twitter:app:id:iphone'}
    
    {u'content': u'se-zaphod://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
     u'name': u'twitter:app:url:iphone'}
    
    {u'content': u'Stack Exchange iOS', u'name': u'twitter:app:name:ipad'}
    
    {u'content': u'871299723', u'name': u'twitter:app:id:ipad'}
    
    {u'content': u'se-zaphod://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
     u'name': u'twitter:app:url:ipad'}
    
    {u'content': u'Stack Exchange Android',
     u'name': u'twitter:app:name:googleplay'}
    
    {u'content': u'http://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
     u'name': u'twitter:app:url:googleplay'}
    
    {u'content': u'com.stackexchange.marvin',
     u'name': u'twitter:app:id:googleplay'}
    

    正是我要找的东西…第一个变化是您为元素['@*']创建了属性数组,然后使用name()进行数据部分和名称反向查找…太棒了!!!
    >>> for item in selector.xpath('.//*[@itemscope]'):
    ...     print "Item:", item.xpath('@itemtype').extract()
    ...     for property in item.xpath('.//*[@itemprop]'):
    ...         print "Property:",
    ...         print property.xpath('@itemprop').extract(),
    ...         print property.xpath('string(.)').extract()
    ...         for position, attribute in enumerate(property.xpath('@*'), start=1):
    ...             print "attribute: name=%s; value=%s" % (
    ...                 property.xpath('name(@*[%d])' % position).extract(),
    ...                 attribute.extract())
    ...         print
    ...     print
    ... 
    Item: [u'http://schema.org/Movie']
    Property: [u'name'] [u'Avatar']
    attribute: name=[u'itemprop']; value=name
    
    Property: [u'director'] [u'n  Director: James Cameron n(born August 16, 1954)n  ']
    attribute: name=[u'itemprop']; value=director
    attribute: name=[u'itemscope']; value=
    attribute: name=[u'itemtype']; value=http://schema.org/Person
    
    Property: [u'name'] [u'James Cameron']
    attribute: name=[u'itemprop']; value=name
    
    Property: [u'birthDate'] [u'August 16, 1954']
    attribute: name=[u'itemprop']; value=birthDate
    attribute: name=[u'datetime']; value=1954-08-16
    
    Property: [u'genre'] [u'Science fiction']
    attribute: name=[u'itemprop']; value=genre
    
    Property: [u'trailer'] [u'Trailer']
    attribute: name=[u'href']; value=../movies/avatar-theatrical-trailer.html
    attribute: name=[u'itemprop']; value=trailer
    
    
    Item: [u'http://schema.org/Person']
    Property: [u'name'] [u'James Cameron']
    attribute: name=[u'itemprop']; value=name
    
    Property: [u'birthDate'] [u'August 16, 1954']
    attribute: name=[u'itemprop']; value=birthDate
    attribute: name=[u'datetime']; value=1954-08-16
    
    >>>