Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python scrapy:使用xpath从div中删除一些元素span元素_Python_Html_Css_Xpath_Scrapy - Fatal编程技术网

Python scrapy:使用xpath从div中删除一些元素span元素

Python scrapy:使用xpath从div中删除一些元素span元素,python,html,css,xpath,scrapy,Python,Html,Css,Xpath,Scrapy,我正在做一些刮削,我想排除一些元素。例如,在main div id=“Introduction”中,我只想删除h2和2个段落,不包括span class=“section\u edit\u link”和div class=“photo\u container”。我当然可以提取我想要的元素并连接它们,但是因为每个部分都有我想要排除的这两个元素,有没有办法在xpath上排除它们 <div id="Introduction"><span class="section_edit_lin

我正在做一些刮削,我想排除一些元素。例如,在main div id=“Introduction”中,我只想删除h2和2个段落,不包括span class=“section\u edit\u link”和div class=“photo\u container”。我当然可以提取我想要的元素并连接它们,但是因为每个部分都有我想要排除的这两个元素,有没有办法在xpath上排除它们

<div id="Introduction"><span class="section_edit_link"><a href="/wiki_edit.cfm?title=Seoul&amp;section=Introduction" title="Edit section: Introduction" rel="nofollow">edit</a> </span>
<h2>Introduction</h2>
<div class="photo_container">
    <a href="https://www.travellerspoint.com/photos/stream/photoID/80/features/countries/South Korea/"><img src="https://photos.travellerspoint.com/8818/thumb_dhessel_seoul.jpg" width="200" height="146" alt="Night time traffic in Seoul" class="photo"></a>
    <h4>Night time traffic in Seoul</h4>
    <p>© All Rights Reserved <a href="https://www.travellerspoint.com/users/Hessell/">Hessell</a></p>
</div>
<p><strong>Seoul</strong> (서울) is the heart of <a href="http://www.travellerspoint.com/guide/South_Korea/">South Korea</a>, hosting about a quarter of the country's population of nearly 50 million. Seoul was also the historic capital of Korea from the 14th century until the nation's partition into <a href="http://www.travellerspoint.com/guide/North_Korea/">North</a> and <a href="http://www.travellerspoint.com/guide/South_Korea/">South</a> in 1948. Located just 50 kilometres south of the North Korean border, Seoul symbolises the division of North and South Korea. </p>
<p>Seoul enjoys a lively nightlife, which has earned it comparisons with <a href="http://www.travellerspoint.com/guide/Tokyo/">Tokyo</a>. Thankfully though, Seoul is much cheaper than the <a href="http://www.travellerspoint.com/guide/Japan/">Japanese</a> capital.</p>

介绍
首尔的夜间交通
©保留所有权利

首尔(서울) 首尔是朝鲜的心脏,拥有该国近5000万人口的四分之一。从14世纪到1948年该国被分割为南北韩,首尔一直是朝鲜的历史首都。首尔位于朝鲜边界以南50公里处,象征着南北韩的分裂

首尔的夜生活很热闹,这使得它可以与之相比。不过,谢天谢地,首尔比首都便宜得多


如果您的简介div仅包含上述问题中所示的元素,则以下内容将为您提供所需的结果:

     yield{
          'heading': response.css('#Introduction > h2').extract_first(),
          'para 1': response.css('#Introduction > p').extract_first(),
          'para 1': response.css('#Introduction > p:last-child').extract_first(),
          }

看在上帝的份上,把你的一些代码和你想搜集的HTML源代码都包括进去……不幸的是“简介”这不是我想要删除的唯一部分。以下部分基本相同。这就是我试图找到排除某些元素的方法的原因。类似于这样,但不使用CSS:我还尝试了许多与not()的组合但它们似乎都无效。是吗?如果你没有使用CSS的限制,那么你可以尝试CSS选择器“#简介h2,h2~p”,这将直接在主div下为你提供所有h2和p。看看这是否有帮助,然后让我知道。