TypeError:类型为'的对象；字节'；不是JSON可序列化的Python 3_Python_Web Scraping_Scrapy_Python 3.6_Scrapyd

TypeError:类型为'的对象；字节'；不是JSON可序列化的Python 3

python web-scraping scrapy

TypeError:类型为'的对象；字节'；不是JSON可序列化的Python 3,python,web-scraping,scrapy,python-3.6,scrapyd,Python,Web Scraping,Scrapy,Python 3.6,Scrapyd,运行spider后我收到了这个错误，我也有一个管道，我将所有的东西转换成JSON，但是在返回项目后仍然得到了这个错误 TypeError:类型为“bytes”的对象不可JSON序列化我的代码是 import json import re import types SEPARATOR = '-' FILING_PROPERTIES = ['state_id', 'types', 'description', 'filing_parties', 'file

运行spider后我收到了这个错误，我也有一个管道，我将所有的东西转换成JSON，但是在返回项目后仍然得到了这个错误

TypeError:类型为“bytes”的对象不可JSON序列化

我的代码是

    import json
    import re
    import types

    SEPARATOR = '-'
    FILING_PROPERTIES = ['state_id', 'types', 'description', 'filing_parties', 'filed_on']
    DOCUMENT_PROPERTIES = ['types', 'title', 'blob_name', 'state_id', 'source_url']


    class AeeiPipeline(object):
        def process_item(self, item, spider):
            import pdb
            #
            if item.get('title', None):
                item['source_title'], item['title'] = self.title_case(item['title'])
            if item.get('description'):
                pdb.set_trace()
                item['description'] = self.title_case(item['description'])
            for filing in item.get("filings", []):
                if filing.get('description'):
                    pdb.set_trace()
                    filing['description'] = self.title_case(filing['description'])
                for _key in ["filing_parties", "types"]:
                    if not (_key in filing and filing[_key]):
                        filing[_key] = []
                    elif isinstance(filing[_key], str):
                        filing[_key] = [filing[_key]]

                for doc in filing.get("documents", []):
                    if doc.get('name'):
                        doc['name'] = doc['name']
                    if doc.get('title'):
                        doc['title'] = self.make_unicode(doc['title'])
                    if "types" in doc and not type(doc["types"]) is list:
                        doc["types"] = [doc["types"]]
            for _key in ["industries", "assignees", "major_parties", "source_assignees", "source_major_parties"]:
                if not (_key in item and item[_key]):
                    item[_key] = []
                elif isinstance(item[_key], str):
                    item[_key] = [item[_key]]

            for key, value in item.items():
                if type(item[key]) is str:
                    item[key] = value.strip()
            pdb.set_trace()
            item = json.dumps(item) + '\n'
            return item

        def title_case(self, title):
            title = self.make_unicode(title)
            return title, re.sub(u"[A-Za-z]+(('|\u2019)[A-Za-z]+)?",
                                 lambda mo: mo.group(0)[0].upper() + mo.group(0)[1:].lower(),
                                 title)

这意味着您正在使用Scrapy的Item类

解决方案是要么这样做

item = json.dumps(dict(item))

或者在您的Spider中，不要使用Item类来创建Item，只需使用Dict，如

Item={}

这意味着您的Dict中有一个

bytes

字段，您几乎没有选择，您可以构建自己的JSON编码器，或者简单地转换为

str

。

json.dumps（item，default=str）

有效吗？我这样做了

item=json.dumps（item）+'\n'

，但得到了错误TypeError:type'PucItem'的对象不是json Serializable请阅读以了解如何编写一个好问题。我的建议是，避免自己在json字符串中添加字符，这是json编码器的工作（由

dumps

方法调用）。如果没有对错误的完整回溯（复制粘贴，没有屏幕截图），就很难知道。您是否尝试过

default=str

，但没有按照我之前的评论在末尾添加

\n

？

item = json.dumps(dict(item))