Python xpath只获取内容而不获取self标记_Python_Xpath_Scrapy

Python xpath只获取内容而不获取self标记

python xpath scrapy

Python xpath只获取内容而不获取self标记,python,xpath,scrapy,Python,Xpath,Scrapy,这给了我以下信息： <div id="content"> foo <br/> bar <br/> </div> response.xpath('//div[@id ="content"]').extract() [u'foo条我如何获得： [u'<div id="content"> foo<br/>bar <br/></div> foo条 lxml在很多地方都非常不方便，获取元素

这给了我以下信息：

<div id="content">
   foo <br/>
   bar <br/>
</div>

response.xpath('//div[@id ="content"]').extract()

[u'foo
条

我如何获得：

[u'<div id="content"> foo<br/>bar <br/></div>

foo
条

lxml在很多地方都非常不方便，获取元素的内部HTML就是其中之一。改编自：

使用中：

from lxml import html

def inner_html(element):
    return (
        (element.text or '') +
        ''.join(html.tostring(child, encoding='unicode') for child in element)
    )

>>从scrapy.selector导入选择器
>>>响应=选择器（text=”“”
... 
…foo

…条

... 
... """)
>>>内部html（response.css（'#content'）[0].root）
“\n foo
\n bar
\n”

lxml在很多地方都非常不方便，获取元素的内部HTML就是其中之一。改编自：

使用中：

from lxml import html

def inner_html(element):
    return (
        (element.text or '') +
        ''.join(html.tostring(child, encoding='unicode') for child in element)
    )

>>从scrapy.selector导入选择器
>>>响应=选择器（text=”“”
... 
…foo

…条

... 
... """)
>>>内部html（response.css（'#content'）[0].root）
“\n foo
\n bar
\n”

试试这个：

>>> from scrapy.selector import Selector
>>> response = Selector(text="""
... <div id="content">
...    foo <br/>
...    bar <br/>
... </div>
... """)
>>> inner_html(response.css('#content')[0].root)
'\n   foo <br>\n   bar <br>\n'

试试这个：

>>> from scrapy.selector import Selector
>>> response = Selector(text="""
... <div id="content">
...    foo <br/>
...    bar <br/>
... </div>
... """)
>>> inner_html(response.css('#content')[0].root)
'\n   foo <br>\n   bar <br>\n'

您使用什么语言调用response.xpath和.extract（）？更新了问题。您使用什么语言调用response.xpath和.extract（）？更新了问题。这将失去区分元素和文本的能力（例如

）。这将失去区分元素和文本的能力（例如

）。

response.xpath('normalize-space(//div[@id ="content"])').extract_first()
# output: u'foo bar'