Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/objective-c/26.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何按自定义顺序对零碎物品信息进行排序?_Python_Scrapy - Fatal编程技术网

Python 如何按自定义顺序对零碎物品信息进行排序?

Python 如何按自定义顺序对零碎物品信息进行排序?,python,scrapy,Python,Scrapy,scrapy中的默认顺序是字母表,我读过一些文章,使用OrderedDict以自定义顺序输出项目。 我在网页后面写一只蜘蛛。 My items.py import scrapy from collections import OrderedDict class OrderedItem(scrapy.Item): def __init__(self, *args, **kwargs): self._values = OrderedDict() if a

scrapy中的默认顺序是字母表,我读过一些文章,使用OrderedDict以自定义顺序输出项目。
我在网页后面写一只蜘蛛。

My items.py

import scrapy
from collections import OrderedDict


class OrderedItem(scrapy.Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:  
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

class StockinfoItem(OrderedItem):
    name = scrapy.Field()
    phone = scrapy.Field()
    address = scrapy.Field()
简单的spider文件

import scrapy
from info.items import InfoItem

class InfoSpider(scrapy.Spider):
    name = 'Info'
    allowed_domains = ['quotes.money.163.com']
    start_urls = [ "http://quotes.money.163.com/f10/gszl_600023.html"]
    def parse(self, response):
        item = InfoItem()
        item["name"] = response.xpath('/html/body/div[2]/div[4]/table/tr[2]/td[2]/text()').extract()
        item["phone"] = response.xpath('/html/body/div[2]/div[4]/table/tr[7]/td[4]/text()').extract()
        item["address"] = response.xpath('/html/body/div[2]/div[4]/table/tr[2]/td[4]/text()').extract()
        item.items()
        yield  item
运行爬行器的时间信息

2019-04-25 13:45:01 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'address': ['浙江省杭州市天目山路152号浙能大厦'],'name': ['浙能电力'],'phone': ['0571-87210223']}
感谢Gallaecio的建议,在settings.py中添加以下内容

FEED_EXPORT_FIELDS=['name','phone','address']
执行spider并输出到csv文件

scrapy crawl  info -o  info.csv
字段顺序为我的自定义顺序

cat info.csv
name,phone,address
浙能电力,0571-87210223,浙江省杭州市天目山路152号浙能大
查看scrapy的调试信息:

2019-04-26 00:16:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'address': ['浙江省杭州市天目山路152号浙能大厦'],
 'name': ['浙能电力'],
 'phone': ['0571-87210223']}
2019-04-26 00:16:38[scrapy.core.scraper]调试:从
{'address':['浙江省杭州市天目山路152号浙能大厦'],
'名称':['浙能电力'],
“电话”:[0571-87210223']}
如何按自定义顺序生成调试信息?如何获得以下调试输出

2019-04-26 00:16:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'name': ['浙能电力'],
 'phone': ['0571-87210223'],
 'address': ['浙江省杭州市天目山路152号浙能大厦'],}
2019-04-26 00:16:38[scrapy.core.scraper]调试:从
{'name':['浙能电力'],
'电话':['0571-87210223'],
'地址':['浙江省杭州市天目山路152号浙能大厦'],}

您可以定义项目的自定义字符串表示形式

class InfoItem:
    def __repr__(self):
      return 'name: {}, phone: {}, address: {}'.format(self['name'], self.['phone'], self.['address'])

在spider中,将
item.items()
替换为
self.log(item.items())
,log msg应该是元组列表,以便在spider中分配元组


另一种方法是将您在帖子中提到的答案与

结合起来。
的功能中存在问题。最初它的代码是:

def __repr__(self):
    return pformat(dict(self))
因此,即使您将项目转换为
orderedict
,并希望字段以相同的顺序保存,此函数也会对其应用
dict()
,并打破顺序

因此,我建议您以您喜欢的方式使其过载,例如:

import json

class OrderedItem(scrapy.Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

    def __repr__(self):
        return json.dumps(OrderedDict(self), ensure_ascii = False)  # it should return some string
现在您可以得到以下输出:

2019-04-30 18:56:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{"name": ["\u6d59\u80fd\u7535\u529b"], "phone": ["0571-87210223"], "address": ["\u6d59\u6c5f\u7701\u676d\u5dde\u5e02\u5929\u76ee\u5c71\u8def152\u53f7\u6d59\u80fd\u5927\u53a6"]}
2019-04-30 18:56:20[scrapy.core.scraper]调试:从
{“姓名”:[“\u6d59\u80fd\u7535\u529b”],“电话”:[“0571-87210223”,“地址”:[“\u6d59\u6c5f\u7701\u676d\u5dde\u5e02\u5929\u76ee\u5c71\u8def152\u53f7\u6d59\u80fd\u5927\u53a6”]

可以在cjk外观中输出自定义dubug信息的整个items.py如下所示

import scrapy
import json    
from collections import OrderedDict

class OrderedItem(scrapy.Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

    def __repr__(self):
        return json.dumps(OrderedDict(self),ensure_ascii = False)  
        #ensure_ascii = False ,it make characters show in cjk appearance.

class StockinfoItem(OrderedItem):
    name = scrapy.Field()
    phone = scrapy.Field()
    address = scrapy.Field()

您可以自定义实际输出文件中的字段顺序。请参阅.Add
确保json.dump中的\u ascii=False
以cjk显示字符。
import scrapy
import json    
from collections import OrderedDict

class OrderedItem(scrapy.Item):
    def __init__(self, *args, **kwargs):
        self._values = OrderedDict()
        if args or kwargs:
            for k, v in six.iteritems(dict(*args, **kwargs)):
                self[k] = v

    def __repr__(self):
        return json.dumps(OrderedDict(self),ensure_ascii = False)  
        #ensure_ascii = False ,it make characters show in cjk appearance.

class StockinfoItem(OrderedItem):
    name = scrapy.Field()
    phone = scrapy.Field()
    address = scrapy.Field()