Django python中div内的Scrapy连接数组元素_Django_Web Scraping_Scrapy

Django python中div内的Scrapy连接数组元素

django web-scraping scrapy

Django python中div内的Scrapy连接数组元素,django,web-scraping,scrapy,Django,Web Scraping,Scrapy,我需要将中的一些文本与中的xpath连接起来。div具有下一个结构： <div class="col-12 e-description" itemprop="description"> "-Text1" <br> <br> "-Text2" <br> <br> "-Text3" </div> 如果我这样做 item['description']=response.xpath（'//div[@it

我需要将

中的一些文本与

中的xpath
连接起来。div
具有下一个结构：
<div class="col-12 e-description" itemprop="description">
  "-Text1"
  <br>
  <br>
  "-Text2"
  <br>
  <br>
  "-Text3"
</div>

如果我这样做
item['description']=response.xpath（'//div[@itemprop=“description”]/text（））.extract（）

所有内容都用逗号混合和分隔，如下所示：
for subItem in response.xpath('//div[@itemprop="description"]/text()'):
        item['description'] = " ".join(subItem.xpath('//div[@itemprop="something_here"]/text()')extract())

-Text1
，-Text2
，-Text3

我认为这是因为response.xpath（'//div[@itemprop=“description”]/text（））.extract（）返回一个数组，因此它添加逗号来分隔数组项
我试图在数组上循环，并加入“description”ScrapyItem
属性中的每个项
这就是我正在尝试的：
def parse_item(self, response):
    item = MyScrapyItem()
    item['name'] = response.xpath('normalize-space(//span[@itemprop="name"]/text())').extract()

    for subItem in response.xpath('//div[@itemprop="description"]/text()'):
        item['description'] = " ".join(subItem.extract())

我知道如果我能做这样的事情，它会起作用：
for subItem in response.xpath('//div[@itemprop="description"]/text()'):
        item['description'] = " ".join(subItem.xpath('//div[@itemprop="something_here"]/text()')extract())

但是包含文本的div
中没有更多的标记
任何帮助都将不胜感激，这是我的第一个Scrapy
项目。相反，
你用过
item['description']=response.xpath（'//div[@itemprop=“description”]/text（））.extract（）
这将返回一个列表
直接加入名单
item['description']=“”.join（response.xpath（'//div[@itemprop=“description”]/text（））.extract（））
我不完全清楚你的问题是什么。如果这是如何将项目连接在一起，那么您的思路是正确的item['description']=''.join（response.xpath（'//div[@itemprop=“description”]/text（））.extract（））