Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html xpath选择节点文本和子节点_Html_Xpath_Web Scraping_Scrapy - Fatal编程技术网

Html xpath选择节点文本和子节点

Html xpath选择节点文本和子节点,html,xpath,web-scraping,scrapy,Html,Xpath,Web Scraping,Scrapy,我正在使用python scrapy从一个网站上刮取一些数据 网站内容是这样的 <html> <div class="details"> <div class="a"> not needed</div> content 1 <p>content 2</p> <div>content 2</div> <p>content 2</p> <div&

我正在使用python scrapy从一个网站上刮取一些数据

网站内容是这样的

 <html>
  <div class="details">
  <div class="a"> not needed</div>
  content 1
  <p>content 2</p>
  <div>content 2</div>
  <p>content 2</p>
  <div>content 2</div>
  <p>content 2</p>
  <div class="b"> this is also not needed</div>
  </div>
 </html>
<div class="details">   
content 1
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
</div>
我需要得到完整的html数据,不包括类a,b的div

所以我的输出是这样的

 <html>
  <div class="details">
  <div class="a"> not needed</div>
  content 1
  <p>content 2</p>
  <div>content 2</div>
  <p>content 2</p>
  <div>content 2</div>
  <p>content 2</p>
  <div class="b"> this is also not needed</div>
  </div>
 </html>
<div class="details">   
content 1
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
<div>content 2</div>
<p>content 2</p>
</div>
我该如何为此编写正确的xpath,还是应该为类为'details','a','b'的div编写xpath,并使用字符串操作删除类为'a','b'的div


请注意,此处内容是的文本,不是类为“details”的div的子级。

您可以使用node和self::语法获取除类为a或b的div之外的所有子级:

演示使用:


使用node和self::syntax可以获得除div之外的所有子级,其中包含类a或b:

演示使用: