Python Scrapy在同一页的某些项目上抛出Unicode错误_Python_Scrapy_Amazon Dynamodb

Python Scrapy在同一页的某些项目上抛出Unicode错误

python scrapy amazon-dynamodb

Python Scrapy在同一页的某些项目上抛出Unicode错误,python,scrapy,amazon-dynamodb,Python,Scrapy,Amazon Dynamodb,我的爬行器会跳过我分析的一些项，从而引发此错误。所有项目都在一页上。共有20项。通常，有3或4个被跳过。如有任何建议，请：文件/home/ec2 user/project/project/pipelines.py，第19行，过程中项目 “标题”：横线[“标题”]， UnicodeEncodeError:“ascii”编解码器无法对位置25中的字符u'\u201c'进行编码：序号不在范围128中蜘蛛网： def parse(self, response): for item in r

我的爬行器会跳过我分析的一些项，从而引发此错误。所有项目都在一页上。共有20项。通常，有3或4个被跳过。如有任何建议，请：

文件/home/ec2 user/project/project/pipelines.py，第19行，过程中项目 “标题”：横线[“标题”]， UnicodeEncodeError:“ascii”编解码器无法对位置25中的字符u'\u201c'进行编码：序号不在范围128中

蜘蛛网：

def parse(self, response):

    for item in response.xpath("//li[contains(@class, 'river-block')]"):
        url = item.xpath(".//h2/a/@href").extract()[0]
        stamp = item.xpath(".//time/@datetime").extract_first()
        yield scrapy.Request(url, callback=self.get_details, meta={'stamp': stamp})

def get_details(self, response):
        article = ArticleItem()
        article['title'] = response.xpath("//header/h1/text()").extract_first()
        article['url'] = format(shortener.short(response.url))
        article['stamp'] = response.meta['stamp']
        yield article

管道：

DynamoDBStorePipelineobject类：

def process_item(self, item, spider):
    dynamodb = boto3.resource('dynamodb',region_name="us-west-2")

    table = dynamodb.Table('db1')

    table.put_item(
    Item={
    'url': str(item['url']),
    'title': str(item['title']),
    'stamp': str(item['stamp']),
    }
    )
    return item

我将'title'：stritem['title']更改为'title'：item['title']。编码'utf-8'现在一切正常了

看起来Scrapy正在返回Unicode字符串，而您正在将它们转换为字节字符串，为什么要这样做？别说str了，是的，我只是自己改正了。我把'title'：stritem['title']改为'title'：item['title']。编码'utf-8'现在一切都好了是的，但是为什么首先需要编码？直接使用Unicode字符串会更好，特别是如果您打算转换为Python 3。否则，某些数据不会保存到DynamoDB，因为它会引发编码错误