Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在scrapy中导出为CSV格式不正确_Python_Csv_Web Scraping_Scrapy_Export To Csv - Fatal编程技术网

Python 在scrapy中导出为CSV格式不正确

Python 在scrapy中导出为CSV格式不正确,python,csv,web-scraping,scrapy,export-to-csv,Python,Csv,Web Scraping,Scrapy,Export To Csv,我正在尝试在使用管道线刮取后打印CSV文件,但格式有点奇怪,因为它不是从上到下打印,而是在刮取第1页,然后在一列中打印第2页之后一次全部打印。我附加了piplines.py和csv输出中的一行(相当大)。那么,我该如何在一个页面上一次全部按列打印呢 皮普林 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES se

我正在尝试在使用管道线刮取后打印CSV文件,但格式有点奇怪,因为它不是从上到下打印,而是在刮取第1页,然后在一列中打印第2页之后一次全部打印。我附加了piplines.py和csv输出中的一行(相当大)。那么,我该如何在一个页面上一次全部按列打印呢

皮普林

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter

class CSVPipeline(object):

    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline


    def spider_opened(self, spider):
        file = open('%s_items.csv' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = CsvItemExporter(file)
        self.exporter.fields_to_export = ['names','stars','subjects','reviews']
        self.exporter.start_exporting()

    def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()


    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item
和output.csv

names   stars   subjects
Vivek0388,NikhilVashisth,DocSharad,Abhimanyu_swarup,Suresh N,kaushalhkapadia,JyotiMallick,Nitin T,mhdMumbai,SunilTukrel(COLUMN 2)   5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(COLUMN 3) Best Stay,Awesome View... Nice Experience!,Highly mismanaged and dishonest.,A Wonderful Experience,Good place with average front office,Honeymoon,Awesome Resort,Amazing,ooty's beauty!!,Good stay and food
应该是这样的

Vivek0388      5 of 5
NikhilVashisth 5 of 5
DocSharad      5 of 5
...so on
编辑:


找出它,csv压缩它,然后for循环它并写入行。一旦你阅读了文档,这就不那么复杂了

import csv
import itertools

class CSVPipeline(object):

   def __init__(self):
      self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
      self.csvwriter.writerow(['names','starts','subjects','reviews'])

   def process_item(self, item, ampa):

      rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])


      for row in rows:
         self.csvwriter.writerow(row)

      return item

忘了提及我已更改设置您知道,您的刮板将所有名称作为列表存储在项目中?(我记得昨天的问题)。尝试将每个条目拆分为单独的项,以获得所需的结果。你所有的条目都是一样的:你的一个条目是一个条目列表。我试过了,但没有成功,我得到的只是一个空白文档。因为无论我在蜘蛛身上定义了什么,它都会被调用。但我想我会转换成JSON,然后让它转换成CSV,因为我更习惯于这样。谢谢你的帮助!没问题,但正如我所说的,你应该在爬行器本身中处理这些结果,然后它会像一个符咒一样工作。我尝试了,但我不断收到错误,说我需要返回项/字段()我尝试返回一个dict,但我再次出错。由于它是一个递归调用,因此它将重新定义dict并删除它。但我会再试一次,照你说的做。
import csv
import itertools

class CSVPipeline(object):

   def __init__(self):
      self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
      self.csvwriter.writerow(['names','starts','subjects','reviews'])

   def process_item(self, item, ampa):

      rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])


      for row in rows:
         self.csvwriter.writerow(row)

      return item