Python 使用选择器保留某些文本并从某些元素中丢弃其余文本_Python_Python 3.x_Web Scraping_Css Selectors_Lxml

Python 使用选择器保留某些文本并从某些元素中丢弃其余文本

python python-3.x web-scraping

Python 使用选择器保留某些文本并从某些元素中丢弃其余文本,python,python-3.x,web-scraping,css-selectors,lxml,Python,Python 3.x,Web Scraping,Css Selectors,Lxml,从下面的html元素中，我如何选择将文本保持在hi状态！！并使用css选择器丢弃另一个文本Cat？此外，使用.text或.text.strip不会得到结果，但使用.text\u内容时会得到文本 from lxml.html import fromstring html=""" <div id="item_type" data-attribute="item_type" class="ms-crm-Inline" aria-describe="item_type_c"> &l

从下面的html元素中，我如何选择将文本保持在hi状态！！并使用css选择器丢弃另一个文本Cat？此外，使用.text或.text.strip不会得到结果，但使用.text\u内容时会得到文本

from lxml.html import fromstring

html="""
<div id="item_type" data-attribute="item_type" class="ms-crm-Inline" aria-describe="item_type_c">
    <div>
        <label for="item_type_outer" id="Type_outer">
            <div class="NotVisible">Cat</div>
        Hi there!!
            <div class="GradientMask"></div>
        </label>
    </div>
</div>
"""
root = fromstring(html)
for item in root.cssselect("#Type_outer"):
    print(item.text)  # doesn't work
    print(item.text.strip()) # doesn't work
    print(item.text_content()) # working one

然而，我想得到的结果只是你好！！为此，我尝试的是：

root.cssselect("#Type_outer:not(.NotVisible)") #it doesn't work either

再次提出问题：

为什么.text\u内容有效，但.text或.text.strip无效？我怎样才能在那里得到唯一的你好！！使用css选择器？

在lxml树模型中，要获取的文本位于div的尾部，类为NotVisible：

>>> root = fromstring(html)
>>> for item in root.cssselect("#Type_outer > div.NotVisible"):
...     print(item.tail.strip())
...
Hi there!!

因此，要回答第一个问题，父对象的文本属性中只有前面没有元素的文本节点。具有前面同级元素的文本节点（如本问题中的文本节点）将位于该元素的尾部属性中

另一种获取文本的方法你好！！是通过查询作为标签直接子级的非空文本节点。可以使用XPath表达式查询此类详细级别：

for item in root.cssselect("#Type_outer"):
    print(item.xpath("text()[normalize-space()]")[0].strip())

不行！！！你真是帮了大忙。最后一件事：你能告诉我为什么root.cssselectType\u outer:not.NotVisible失败了吗？原谅我的无知。再次感谢。该表达式选择id类型为_outer且类不可见的元素，因此在本例中，它基本上返回与简单类型为_outer的元素相同的元素，因为具有该id的标签也不具有类不可见

for item in root.cssselect("#Type_outer"):
    print(item.xpath("text()[normalize-space()]")[0].strip())