Python 在scrapy中导出为CSV格式不正确_Python_Csv_Web Scraping_Scrapy_Export To Csv

Python 在scrapy中导出为CSV格式不正确

python csv web-scraping scrapy

Python 在scrapy中导出为CSV格式不正确,python,csv,web-scraping,scrapy,export-to-csv,Python,Csv,Web Scraping,Scrapy,Export To Csv,我正在尝试在使用管道线刮取后打印CSV文件，但格式有点奇怪，因为它不是从上到下打印，而是在刮取第1页，然后在一列中打印第2页之后一次全部打印。我附加了piplines.py和csv输出中的一行（相当大）。那么，我该如何在一个页面上一次全部按列打印呢皮普林 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES se

我正在尝试在使用管道线刮取后打印CSV文件，但格式有点奇怪，因为它不是从上到下打印，而是在刮取第1页，然后在一列中打印第2页之后一次全部打印。我附加了piplines.py和csv输出中的一行（相当大）。那么，我该如何在一个页面上一次全部按列打印呢

皮普林

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter

class CSVPipeline(object):

    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline


    def spider_opened(self, spider):
        file = open('%s_items.csv' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = CsvItemExporter(file)
        self.exporter.fields_to_export = ['names','stars','subjects','reviews']
        self.exporter.start_exporting()

    def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()


    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

和output.csv

names   stars   subjects
Vivek0388,NikhilVashisth,DocSharad,Abhimanyu_swarup,Suresh N,kaushalhkapadia,JyotiMallick,Nitin T,mhdMumbai,SunilTukrel(COLUMN 2)   5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(COLUMN 3) Best Stay,Awesome View... Nice Experience!,Highly mismanaged and dishonest.,A Wonderful Experience,Good place with average front office,Honeymoon,Awesome Resort,Amazing,ooty's beauty!!,Good stay and food

应该是这样的

Vivek0388      5 of 5
NikhilVashisth 5 of 5
DocSharad      5 of 5
...so on

编辑：

找出它，csv压缩它，然后for循环它并写入行。一旦你阅读了文档，这就不那么复杂了

import csv
import itertools

class CSVPipeline(object):

   def __init__(self):
      self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
      self.csvwriter.writerow(['names','starts','subjects','reviews'])

   def process_item(self, item, ampa):

      rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])


      for row in rows:
         self.csvwriter.writerow(row)

      return item

忘了提及我已更改设置您知道，您的刮板将所有名称作为列表存储在项目中？（我记得昨天的问题）。尝试将每个条目拆分为单独的项，以获得所需的结果。你所有的条目都是一样的：你的一个条目是一个条目列表。我试过了，但没有成功，我得到的只是一个空白文档。因为无论我在蜘蛛身上定义了什么，它都会被调用。但我想我会转换成JSON，然后让它转换成CSV，因为我更习惯于这样。谢谢你的帮助！没问题，但正如我所说的，你应该在爬行器本身中处理这些结果，然后它会像一个符咒一样工作。我尝试了，但我不断收到错误，说我需要返回项/字段（）我尝试返回一个dict，但我再次出错。由于它是一个递归调用，因此它将重新定义dict并删除它。但我会再试一次，照你说的做。

import csv
import itertools

class CSVPipeline(object):

   def __init__(self):
      self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
      self.csvwriter.writerow(['names','starts','subjects','reviews'])

   def process_item(self, item, ampa):

      rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])


      for row in rows:
         self.csvwriter.writerow(row)

      return item