Python2.7 pymongo无法使用umlaut更新值_Python_Encoding_Pymongo

Python2.7 pymongo无法使用umlaut更新值

python encoding

Python2.7 pymongo无法使用umlaut更新值,python,encoding,pymongo,Python,Encoding,Pymongo,如果我尝试用pymongo更新mongodb中包含umlaut字符的值，它会抛出 strings in documents must be valid UTF-8: "ccopy_reg\n_reconstructor\np0\n(ctextblob.cla... 我尝试编码手册： enc='UTF-8' content = request.get_data() # raw encoded content u_content = conten

如果我尝试用pymongo更新mongodb中包含umlaut字符的值，它会抛出

strings in documents must be valid UTF-8: "ccopy_reg\n_reconstructor\np0\n(ctextblob.cla...

我尝试编码手册：

        enc='UTF-8'  
        content = request.get_data() # raw encoded content
        u_content = content.decode(enc) # decodes from enc to unicode
        utf8_content = u_content.encode("UTF-8")

如果我改为使用enc='UTF-8'其他编码，它可以工作，但umlaut字符是错误的。如果我不尝试解码和编码，我会得到同样的异常

所有代码：

try:
    # Load params arriving as json data
    enc='UTF-8'  
    content = request.get_data() # raw encoded content
    print repr(content)
    u_content = content.decode(enc) # decodes from enc to unicode
    utf8_content = u_content.encode('UTF-8')
    params = json.loads(utf8_content)
    # Check all parameters
    customer_id = params.get('customer', '')
    check_credentials(customer_id, params.get('apikey', ''))
    collection_id = params.get('collection', '')
    if not collection_id or not str(collection_id).isdigit():
        raise Exception, "Invalid collection"
    train_records = params.get('train', [])    
    if not train_records:
        raise Exception, "Train records are needed in the 'train' parameter"
    # Store the trained classifier in database for a better performance
    train_records = map(lambda x: x.values(), train_records)
    cl = NaiveBayesClassifier(train_records)
    pk = '%s__%i' % (customer_id, collection_id)
    data = {'_id': pk, 'customer': customer_id, 'collection': collection_id, 'classifier': pickle.dumps(cl), 'train':train_records}
    if db.classifiers.find_one({'_id': pk}):
        db.classifiers.update({'_id': pk}, data)
    else:
        db.classifiers.insert(data)
    print 'ok'
    # Asyncronously increase usage count in order to check rate limits
    gevent.spawn(increase_usage, customer_id)

except Exception as e:
    print e

这里我有异常db.classifiers.update{''uid'：pk}，数据

在这一行之后，params=json.loadsutf8\u contentñ从\xc3\xb1转换到\xf1'

请在get\u数据行之后添加打印报告内容，并告诉我们它说了什么。{\n apikey:yt1uy23123123123，\n客户：111111，\n集合：111111，\n培训：[\n{\n文本：\xc3\xb1ww，\n标签：pos n}\n}\n\n]\n}'\xc3\xb1ww=ñww，如果我只是简单地发送ñi get\xc3\xb1，这很奇怪-一切都应该开箱即用，不需要重新编码。如果删除它并执行params=json.loadscontent，会发生什么情况？我还尝试了json.loadsrequest.get_data，结果是相同的