Python 提供空白输出的刮刀_Python_Python 3.x_Web Scraping_Css Selectors

Python 提供空白输出的刮刀

python python-3.x web-scraping

Python 提供空白输出的刮刀,python,python-3.x,web-scraping,css-selectors,Python,Python 3.x,Web Scraping,Css Selectors,我在python脚本中使用了一个选择器，从下面给出的一些html元素中获取文本。我尝试使用.text从元素中获取Shop here廉价的字符串，但根本不起作用。但是，当我尝试使用.text\u content（）时，它会正常工作我的问题是: .text方法有什么问题？为什么它不能解析元素中的文本 Html元素： <div class="Price__container"> <span class="ProductPrice" itemprop="price">$

我在python脚本中使用了一个选择器，从下面给出的一些html元素中获取文本。我尝试使用

.text

从元素中获取

Shop here廉价的字符串，但根本不起作用。但是，当我尝试使用.text\u content（）
时，它会正常工作
我的问题是:
.text
方法有什么问题？为什么它不能解析元素中的文本
Html元素：
<div class="Price__container">
    <span class="ProductPrice" itemprop="price">$6.35</span>
    <span class="ProductPrice_original">$6.70</span>
    Shop here cheap
</div>

顺便说一句，我不想继续使用.text\u content（）
，这就是为什么我希望任何答案都能使用.text
来删除文本。提前感谢。
我认为造成混淆的根本原因是lxml
具有表示节点内容的功能，避免了使用特殊的“文本”节点实体，引用：
.text和.tail这两个属性足以表示XML文档中的任何文本内容。通过这种方式，ElementTreeAPI除了元素类之外，不需要任何特殊的文本节点，而元素类往往会经常遇到这种情况（正如您从经典的DOM API中可能知道的那样）
在您的例子中，Shop here sappy
是$6.70
元素的尾部，因此不包括在父节点的.text
值中
除了其他方法（如.text\u content（）
）之外，您还可以通过非递归方式获取所有顶级文本节点来达到尾部：
print(''.join(data.xpath("./text()")).strip())

或者，获取最后一个顶级文本节点：
print(data.xpath("./text()[last()]")[0].strip())

另一种方法可能类似于打击：
content="""
<div class="Price__container">
    <span class="ProductPrice" itemprop="price">$6.35</span>
    <span class="ProductPrice_original">$6.70</span>
    Shop here cheap
</div>
"""
from lxml import html

tree = html.fromstring(content)
for data in tree.cssselect(".Price__container"):
    for item in data:item.drop_tree()
    print(data.text.strip())

感谢您提供了清晰有效的解决方案。
content="""
<div class="Price__container">
    <span class="ProductPrice" itemprop="price">$6.35</span>
    <span class="ProductPrice_original">$6.70</span>
    Shop here cheap
</div>
"""
from lxml import html

tree = html.fromstring(content)
for data in tree.cssselect(".Price__container"):
    for item in data:item.drop_tree()
    print(data.text.strip())

Shop here cheap