如何使用scrapy中的xpath获取节点的所有文本数据_Xpath_Web Scraping_Scrapy

如何使用scrapy中的xpath获取节点的所有文本数据

xpath web-scraping scrapy

如何使用scrapy中的xpath获取节点的所有文本数据,xpath,web-scraping,scrapy,Xpath,Web Scraping,Scrapy,我正试图从网站上获取用户评论数据。我希望在最后有两列数据（评级和评论）下面是一个模拟我的刮片问题的示例xml文件。我已经试过了，得到了输出 <root> <div class="user-review"> <div class="rating"> 5,0 </div> <p class="review-content"> Reiew text of item/movie. <span class=

我正试图从网站上获取用户评论数据。我希望在最后有两列数据（评级和评论）

下面是一个模拟我的刮片问题的示例xml文件。我已经试过了，得到了输出

<root>
  <div class="user-review">
    <div class="rating"> 5,0 </div>
    <p class="review-content"> Reiew text of item/movie.
      <span class="details">
          <span class="details-header">Detail: </span>
      <span class="details-content">Some details to emphasis</span>
      </span>
      Continue to review
    </p>
  </div>
  <div class="user-review">
    <div class="rating"> 4,0 </div>
    <p class="review-content">Reiew text of item/movie.
    </p>
  </div>
  <div class="user-review">
    <div class="rating"> 4,0 </div>
    <p class="review-content">Reiew text of item/movie.
    </p>
  </div>
</root>

输出：

Text=' 5,0 '
Text=' 4,0 '
Text=' 4,0 '

Text='  Reiew text of item/movie.
        '
Text='
Continue to review
    '
Text='Reiew text of item/movie.
    '
Text='Reiew text of item/movie.

当我试图获得复习部分时，第一篇文章分为两部分。因此，我有两个不同大小的列表（3个大小的评级和4个大小的评论），无法将评论与评级匹配

//p[@class="review-content"]/text()

输出：

Text=' 5,0 '
Text=' 4,0 '
Text=' 4,0 '

Text='  Reiew text of item/movie.
        '
Text='
Continue to review
    '
Text='Reiew text of item/movie.
    '
Text='Reiew text of item/movie.

有人能帮我得到一个我想要的输出吗

预期产出1：

Text='  Reiew text of item/movie.
    Continue to review
    '
Text='Reiew text of item/movie.
    '
Text='Reiew text of item/movie.

预期产出2：

Text='  Reiew text of item/movie. Some details to emphasis
    Continue to review
    '
Text='Reiew text of item/movie.
    '
Text='Reiew text of item/movie.

试试这个，sel在这里，在你的情况下可能是响应

tags = sel.xpath('//p[@class="review-content"]')
reviews = []
for tag in tags:
    text = " ".join(tag.xpath('.//text()').extract())
    reviews.append(text)

您必须使用

user review

类循环查看

div

元素，并从每个元素中提取审阅内容。如果您想要一个内衬，请看以下内容：

import scrapy

text = """
<root>
  <div class="user-review">
    <div class="rating"> 5,0 </div>
    <p class="review-content"> Reiew text of item/movie.
      <span class="details">
          <span class="details-header">Detail: </span>
      <span class="details-content">Some details to emphasis</span>
      </span>
      Continue to review
    </p>
  </div>
  <div class="user-review">
    <div class="rating"> 4,0 </div>
    <p class="review-content">Reiew text of item/movie.
    </p>
  </div>
  <div class="user-review">
    <div class="rating"> 4,0 </div>
    <p class="review-content">Reiew text of item/movie.
    </p>
  </div>
</root>
"""

selector = scrapy.Selector(text=text)
review_content = [review.xpath('normalize-space(.//p[@class="review-content"])').extract_first() for review in selector.xpath('//div[@class="user-review"]')]

import scrapy
text=”“”
5,0 
查看项目/电影的文本。
详情：
需要强调的一些细节
继续检讨

4,0 
查看项目/电影的文本。

4,0 
查看项目/电影的文本。

"""
选择器=刮片。选择器（文本=文本）
review_content=[review.xpath（'normalize-space（.//p[@class=“review content”]））。在selector.xpath（'//div[@class=“user review”]'）中提取_first（）进行查看

这个问题可以帮助您在最后一行代码中应该有选择器而不是sel。