Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/58.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从元素/节点中提取HTML_Python_Xpath_Scrapy - Fatal编程技术网

Python 从元素/节点中提取HTML

Python 从元素/节点中提取HTML,python,xpath,scrapy,Python,Xpath,Scrapy,假设有一个html字符串 <div class="content"> This is some test <b>this is bold </b> this is great list of text. </div> <div class="content"> <ul> <li>Item 1</li> <li>Item 2</li>

假设有一个html字符串

<div class="content">
   This is some test <b>this is bold </b> this is great list of text.
</div>
<div class="content">
   <ul>
      <li>Item 1</li>
      <li>Item 2</li>
      <li>Item 3</li>
   </ul>
</div>

如何将两个元素/节点的整个嵌套HTML作为变量中的字符串获取?

如果速度不重要,可以使用BeautifulSoup轻松实现


您可以使用
/node()
——请参见对类似问题的回答

# Returns all child nodes - text as well as elements.
contents = product.select('//div[@class="content"]/node()').extract()
请注意,
extract()
将返回一个列表,您可以通过通常的方式连接该列表来恢复HTML:

html = "\n".join(contents)
下面是xpath

//div[@class="content"]/text()|//div[@class="content"]/b/text()|//div[@class="content"]/ul/li  
给出结果,因为您只需要存储两个元素的数据

contents=product.select('//div[@class="content"]/text()|//div[@class="content"]/b/text()|//div[@class="content"]/ul/li').extract()

现在,内容既有元素的数据,也有元素的数据

希望使用本机支持
//div[@class="content"]/text()|//div[@class="content"]/b/text()|//div[@class="content"]/ul/li  
contents=product.select('//div[@class="content"]/text()|//div[@class="content"]/b/text()|//div[@class="content"]/ul/li').extract()