xpath使用条件选择节点_Xpath_Scrapy

xpath使用条件选择节点

xpath scrapy

xpath使用条件选择节点,xpath,scrapy,Xpath,Scrapy,请使用基于python的框架Scrapy来刮取站点，但我不知道如何使用类值省略号ph来选择文本。有时，课堂上会有一个强烈的标签。到目前为止，我已经成功地在没有子标记strong的情况下提取了文本 <div class="right"> <div class="attrs"> <div class="attr"> <span class="name">Main Products:</span>

请使用基于python的框架Scrapy来刮取站点，但我不知道如何使用类

值省略号ph

来选择文本。有时，课堂上会有一个强烈的标签。到目前为止，我已经成功地在没有子标记

strong

的情况下提取了文本

<div class="right">
    <div class="attrs">
        <div class="attr">
            <span class="name">Main Products:</span>
                <div class="value ellipsis ph">
 // Here below i needed to select it ignoring the strong tag
                    <strong>Shoes</strong> 
                    (Sport
                    <strong>Shoes</strong>
                    ,Casual
                    <strong>Shoes</strong>
                    ,Hiking
                    <strong>Shoes</strong>
                    ,Skate
                    <strong>Shoes</strong>
                    ,Football
                    <strong>Shoes</strong>
                    )
                </div>
        </div>
    </div>
</div>


<div class="right">
    <div class="attrs">
        <div class="attr">
            <span class="name">Main Products:</span>
                <div class="value ellipsis ph">
                    Cap, Shoe, Bag // could select this

                </div>
        </div>
    </div>
</div>

正如@splash58在评论中所写

//div[@class="value ellipsis ph"]//text()

XPath获取两个文本内容。当然，在第一部分中，它是一个文本列表——但是它们包括

标记中的文本和它们之外的文本。因为

text（）
//div[@class="value ellipsis ph"]//text()

XPath获取两个文本内容。当然，在第一部分中，它是一个文本列表——但是它们包括
标记中的文本和它们之外的文本。因为text（）
获取子树中的所有文本内容，即使有更多的子标记可用。
假设您希望使用类值省略号ph
表示div
元素的文本，您可以：

使用//text（）
或者在div
元素上使用XPath的字符串函数

以下是两个可行的选项：
>>> selector = scrapy.Selector(text="""<div class="right">
...     <div class="attrs">
...         <div class="attr">
...             <span class="name">Main Products:</span>
...                 <div class="value ellipsis ph">
...  <!-- // Here below i needed to select it ignoring the strong tag -->
...                     <strong>Shoes</strong> 
...                     (Sport
...                     <strong>Shoes</strong>
...                     ,Casual
...                     <strong>Shoes</strong>
...                     ,Hiking
...                     <strong>Shoes</strong>
...                     ,Skate
...                     <strong>Shoes</strong>
...                     ,Football
...                     <strong>Shoes</strong>
...                     )
...                 </div>
...         </div>
...     </div>
... </div>
... 
... 
... <div class="right">
...     <div class="attrs">
...         <div class="attr">
...             <span class="name">Main Products:</span>
...                 <div class="value ellipsis ph">
...                     Cap, Shoe, Bag <!-- // could select this -->
... 
...                 </div>
...         </div>
...     </div>
... </div>""")
>>> for div in selector.css('div.value.ellipsis.ph'):
...     print "---"
...     print "".join(div.xpath('.//text()').extract())
... 
---


                    Shoes 
                    (Sport
                    Shoes
                    ,Casual
                    Shoes
                    ,Hiking
                    Shoes
                    ,Skate
                    Shoes
                    ,Football
                    Shoes
                    )

---

                    Cap, Shoe, Bag 


>>> for div in selector.css('div.value.ellipsis.ph'):
...     print "---"
...     print div.xpath('string()').extract_first()
... 
---


                    Shoes 
                    (Sport
                    Shoes
                    ,Casual
                    Shoes
                    ,Hiking
                    Shoes
                    ,Skate
                    Shoes
                    ,Football
                    Shoes
                    )

---

                    Cap, Shoe, Bag 


>>> 

>>selector=scrapy.selector（text=”“”
...     
...         
…主要产品：
...                 
...  
…鞋子
…（体育）
…鞋子
随便的
…鞋子
徒步旅行
…鞋子
滑冰
…鞋子
…足球
…鞋子
...                     )
...                 
...         
...     
... 
... 
... 
... 
...     
...         
…主要产品：
...                 
…帽子、鞋子、包
... 
...                 
...         
...     
... """)
>>>对于selector.css中的div（'div.value.省略号.ph'）：
...     打印“--”
...     打印“.join（div.xpath（'.//text（））.extract（））
... 
---
鞋
（体育
鞋
随便的
鞋
徒步旅行
鞋
滑冰
鞋
、足球
鞋
)
---
帽子、鞋、包
>>>对于selector.css中的div（'div.value.省略号.ph'）：
...     打印“--”
...     打印div.xpath（'string（）'）。首先提取
... 
---
鞋
（体育
鞋
随便的
鞋
徒步旅行
鞋
滑冰
鞋
、足球
鞋
)
---
帽子、鞋、包
>>> 
假设您希望使用类值省略号ph
的div
元素的文本表示，您可以：

使用//text（）
或者在div
元素上使用XPath的字符串函数

以下是两个可行的选项：
>>> selector = scrapy.Selector(text="""<div class="right">
...     <div class="attrs">
...         <div class="attr">
...             <span class="name">Main Products:</span>
...                 <div class="value ellipsis ph">
...  <!-- // Here below i needed to select it ignoring the strong tag -->
...                     <strong>Shoes</strong> 
...                     (Sport
...                     <strong>Shoes</strong>
...                     ,Casual
...                     <strong>Shoes</strong>
...                     ,Hiking
...                     <strong>Shoes</strong>
...                     ,Skate
...                     <strong>Shoes</strong>
...                     ,Football
...                     <strong>Shoes</strong>
...                     )
...                 </div>
...         </div>
...     </div>
... </div>
... 
... 
... <div class="right">
...     <div class="attrs">
...         <div class="attr">
...             <span class="name">Main Products:</span>
...                 <div class="value ellipsis ph">
...                     Cap, Shoe, Bag <!-- // could select this -->
... 
...                 </div>
...         </div>
...     </div>
... </div>""")
>>> for div in selector.css('div.value.ellipsis.ph'):
...     print "---"
...     print "".join(div.xpath('.//text()').extract())
... 
---


                    Shoes 
                    (Sport
                    Shoes
                    ,Casual
                    Shoes
                    ,Hiking
                    Shoes
                    ,Skate
                    Shoes
                    ,Football
                    Shoes
                    )

---

                    Cap, Shoe, Bag 


>>> for div in selector.css('div.value.ellipsis.ph'):
...     print "---"
...     print div.xpath('string()').extract_first()
... 
---


                    Shoes 
                    (Sport
                    Shoes
                    ,Casual
                    Shoes
                    ,Hiking
                    Shoes
                    ,Skate
                    Shoes
                    ,Football
                    Shoes
                    )

---

                    Cap, Shoe, Bag 


>>> 

>>selector=scrapy.selector（text=”“”
...     
...         
…主要产品：
...                 
...  
…鞋子
…（体育）
…鞋子
随便的
…鞋子
徒步旅行
…鞋子
滑冰
…鞋子
…足球
…鞋子
...                     )
...                 
...         
...     
... 
... 
... 
... 
...     
...         
…主要产品：
...                 
…帽子、鞋子、包
... 
...                 
...         
...     
... """)
>>>对于selector.css中的div（'div.value.省略号.ph'）：
...     打印“--”
...     打印“.join（div.xpath（'.//text（））.extract（））
... 
---
鞋
（体育
鞋
随便的
鞋
徒步旅行
鞋
滑冰
鞋
、足球
鞋
)
---
帽子、鞋、包
>>>对于selector.css中的div（'div.value.省略号.ph'）：
...     打印“--”
...     打印div.xpath（'string（）'）。首先提取
... 
---
鞋
（体育
鞋
随便的
鞋
徒步旅行
鞋
滑冰
鞋
、足球
鞋
)
---
帽子、鞋、包
>>> 
//div[@class=“value ellipsis ph”]/text（）
或您的xpath-都只获取文本，而不使用
，这适用于“Cap，Shoe，Bag//可以选择这个”的行，那么使用我需要选择强标记内外的文本来尝试以下操作：/div[@class=“value ellipsis ph”]//text（）
？//div[@class=“value ellipsis ph”]/text（）
或您的xpath-都只获取文本，而不使用
，这适用于“Cap，Shoe，Bag//可以选择此”的行，使用我需要选择stro内外的文本