Python 在scrapy中导出为CSV格式不正确
我正在尝试在使用管道线刮取后打印CSV文件,但格式有点奇怪,因为它不是从上到下打印,而是在刮取第1页,然后在一列中打印第2页之后一次全部打印。我附加了piplines.py和csv输出中的一行(相当大)。那么,我该如何在一个页面上一次全部按列打印呢 皮普林Python 在scrapy中导出为CSV格式不正确,python,csv,web-scraping,scrapy,export-to-csv,Python,Csv,Web Scraping,Scrapy,Export To Csv,我正在尝试在使用管道线刮取后打印CSV文件,但格式有点奇怪,因为它不是从上到下打印,而是在刮取第1页,然后在一列中打印第2页之后一次全部打印。我附加了piplines.py和csv输出中的一行(相当大)。那么,我该如何在一个页面上一次全部按列打印呢 皮普林 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES se
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter
class CSVPipeline(object):
def __init__(self):
self.files = {}
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
file = open('%s_items.csv' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = CsvItemExporter(file)
self.exporter.fields_to_export = ['names','stars','subjects','reviews']
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
和output.csv
names stars subjects
Vivek0388,NikhilVashisth,DocSharad,Abhimanyu_swarup,Suresh N,kaushalhkapadia,JyotiMallick,Nitin T,mhdMumbai,SunilTukrel(COLUMN 2) 5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(COLUMN 3) Best Stay,Awesome View... Nice Experience!,Highly mismanaged and dishonest.,A Wonderful Experience,Good place with average front office,Honeymoon,Awesome Resort,Amazing,ooty's beauty!!,Good stay and food
应该是这样的
Vivek0388 5 of 5
NikhilVashisth 5 of 5
DocSharad 5 of 5
...so on
编辑:
找出它,csv压缩它,然后for循环它并写入行。一旦你阅读了文档,这就不那么复杂了
import csv
import itertools
class CSVPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
self.csvwriter.writerow(['names','starts','subjects','reviews'])
def process_item(self, item, ampa):
rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])
for row in rows:
self.csvwriter.writerow(row)
return item
忘了提及我已更改设置您知道,您的刮板将所有名称作为列表存储在项目中?(我记得昨天的问题)。尝试将每个条目拆分为单独的项,以获得所需的结果。你所有的条目都是一样的:你的一个条目是一个条目列表。我试过了,但没有成功,我得到的只是一个空白文档。因为无论我在蜘蛛身上定义了什么,它都会被调用。但我想我会转换成JSON,然后让它转换成CSV,因为我更习惯于这样。谢谢你的帮助!没问题,但正如我所说的,你应该在爬行器本身中处理这些结果,然后它会像一个符咒一样工作。我尝试了,但我不断收到错误,说我需要返回项/字段()我尝试返回一个dict,但我再次出错。由于它是一个递归调用,因此它将重新定义dict并删除它。但我会再试一次,照你说的做。
import csv
import itertools
class CSVPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')
self.csvwriter.writerow(['names','starts','subjects','reviews'])
def process_item(self, item, ampa):
rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])
for row in rows:
self.csvwriter.writerow(row)
return item