Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy输出JSON或CSV_Python_Json_Excel_Csv_Scrapy - Fatal编程技术网

Python Scrapy输出JSON或CSV

Python Scrapy输出JSON或CSV,python,json,excel,csv,scrapy,Python,Json,Excel,Csv,Scrapy,我正在使用此代码settings.py尝试web抓取 FEED_EXPORT_ENCODING = 'utf-8' import datetime now = datetime.datetime.now () formatted = now.strftime ("%Y%m%d_%H%M") FEED_URI = f'\\C:\\Users\\Acer\\Desktop\\{formatted}.csv' FEED_TYPE = 'csv' 有了这个特别的_offers.py # -*- co

我正在使用此代码settings.py尝试web抓取

FEED_EXPORT_ENCODING = 'utf-8'

import datetime
now = datetime.datetime.now ()
formatted = now.strftime ("%Y%m%d_%H%M")
FEED_URI = f'\\C:\\Users\\Acer\\Desktop\\{formatted}.csv'
FEED_TYPE = 'csv'
有了这个特别的_offers.py

# -*- coding: utf-8 -*-
import scrapy
import datetime


class SpecialOffersSpider(scrapy.Spider):
    name = 'special_offers'
    allowed_domains = ['www.tinydeal.com']

    def start_requests(self):
        yield scrapy.Request(url='https://www.tinydeal.com/specials.html', callback=self.parse, headers={
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
        })

    def parse(self, response):
        for product in response.xpath("//ul[@class='productlisting-ul']/div/li"):
            yield {
                'title': product.xpath(".//a[@class='p_box_title']/text()").get(),
                'url': response.urljoin(product.xpath(".//a[@class='p_box_title']/@href").get()),
                'discounted_price': product.xpath(".//div[@class='p_box_price']/span[1]/text()").get(),
                'original_price': product.xpath(".//div[@class='p_box_price']/span[2]/text()").get(),
                'User-Agent': response.request.headers['User-Agent'].decode('utf-8'),
                'datetime': datetime.datetime.now().strftime("%Y%m%d %H%M")

            }

        next_page = response.xpath("//a[@class='nextPage']/@href").get()

        if next_page:
            yield scrapy.Request(url=next_page, callback=self.parse, headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
            })
然后我打开终端并使用

scrapy crawl special_offers
问题是,当我导出JSON时,数据在}{之间没有逗号。例如,使我的文件不被Power BI读取

当我导出CSV时,数据与我使用EXCEL打开时预期的不同

CSV数据示例 {“标题”:“用于覆盆子皮3 B型和覆盆子皮2 E-524988的ABS塑料外壳”,“url”:“折扣价”:“R$12.74”,“原价”:“R$13.66”,“用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)ApplbKewit/537.36(KHTML,如Gecko)Chrome/76.0.3809.100 Safari/537.36”,“日期时间”:“20200420 2330”} {“标题”:“3M 9001 KN90防尘口罩呼吸器防尘PM2.5工业建筑Polle RTH-562440”,“url”:“折扣价”:“R$10.29”,“原价”:“R$12.40”,“用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36(KHTML,如壁虎)Chrome/76.0.3809.100 Safari/537.36”,“日期时间”:“20200420 2330”} {“标题”:“二合一复古蓝色莱茵石项链+耳环首饰套装DJA-562974”,“url”:“折扣价”:“R$11.77”,“原价”:“R$30.77”,“用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)ApplbKewit/537.36(KHTML,像Gecko)Chrome/76.0.3809.100 Safari/537.36”,“日期时间”:“20200420 2330”} {“标题”:“64GB USB 2.0闪存驱动器USB笔驱动器U盘EFM-561923”,“url”:“折扣价”:“34.83雷亚尔”,“原价”:“99.43雷亚尔”,“用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/76.0.3809.100 Safari/537.36”,“日期时间”:“20200420 2330”}

JSON数据示例

{ “标题”:“B型覆盆子皮3和E-524988覆盆子皮2 ABS塑料外壳”, “url”:“, “折扣价”:“12.74雷亚尔”, “原价”:“13.66雷亚尔”, “用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36(KHTML,类似Gecko)Chrome/76.0.3809.100 Safari/537.36”, “日期时间”:“20200420 2329” } { “标题”:“3M 9001 KN90防尘口罩呼吸器防尘PM2.5工业建筑Polle RTH-562440”, “url”:“, “折扣价”:“10.29雷亚尔”, “原价”:“12.40雷亚尔”, “用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36(KHTML,类似Gecko)Chrome/76.0.3809.100 Safari/537.36”, “日期时间”:“20200420 2329” } { “标题”:“2合1复古蓝色莱茵石项链+耳环首饰套装DJA-562974”, “url”:“, “折扣价”:“11.77雷亚尔”, “原价”:“30.77雷亚尔”, “用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36(KHTML,类似Gecko)Chrome/76.0.3809.100 Safari/537.36”, “日期时间”:“20200420 2329” }


有人能告诉我这些输出中哪里出了问题吗?

你是如何获得这些数据的?从你显示的内容中,我怀疑你是从终端复制的。是吗?如果是,有一种方法可以使用以下命令将其直接保存到文件中:

scrapy crawl特价商品-o/special\u offers.json


希望这能解决您的问题。请告诉我。

您可以使用自己的管道写入json