Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/swift/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用xpath从父html检索嵌套和非嵌套子项?_Python_Html_Xml_Xpath - Fatal编程技术网

Python 如何使用xpath从父html检索嵌套和非嵌套子项?

Python 如何使用xpath从父html检索嵌套和非嵌套子项?,python,html,xml,xpath,Python,Html,Xml,Xpath,我正在使用python创建一个网络爬虫。正在解析的html似乎有一些直接位于父标记中的字符串,如下所示: <div class="chapter-content3"> <noscript>...stuff here filtered successfully</noscript> <center>...stuff here filtered successfully</center> <h4>..stuff here sho

我正在使用python创建一个网络爬虫。正在解析的html似乎有一些直接位于父标记中的字符串,如下所示:

<div class="chapter-content3">
<noscript>...stuff here filtered successfully</noscript>
<center>...stuff here filtered successfully</center>
<h4>..stuff here shows</h4>
<p>...stuff here shows</h4>
<br>
"this stuff here doesnt show"
<br>
"this neither"
 <p>..stuff here shows</p>
 </div>
它会显示所有内容,但不会直接显示内部的字符串

我应该如何构造xpath以直接在父级中显示所有内容,包括字符串,几乎正确。在这里:

//div[@class="chapter-content3"]/*[
   not(self::noscript) and not(self::center) and not(@class="row")
]
*
仅选择实际元素。您希望选择所有节点,这将是

//div[@class="chapter-content3"]//node()[
   not(self::noscript) and not(self::center) and not(@class="row")
]
或者,再短一点

//div[@class="chapter-content3"]//node()[
   not(self::noscript or self::center or @class="row")
]
或者,另一种思考方式-所有文本节点,但祖先不正确的节点除外:

//div[@class="chapter-content3"]//text()[
   not(ancestor::noscript or ancestor::center or ancestor::*/@class="row")
]

要将所有内容都包含在一个xpath中吗?@Edwin,只要结果html与输入html的顺序相同。任何解决方案都可以
//div[@class="chapter-content3"]//text()[
   not(ancestor::noscript or ancestor::center or ancestor::*/@class="row")
]