Can'；t使用Python 3.7使用unicode代码解析Json文本_Python_Json_Python 3.x_Character Encoding

Can'；t使用Python 3.7使用unicode代码解析Json文本

python json python-3.x character-encoding

Can'；t使用Python 3.7使用unicode代码解析Json文本,python,json,python-3.x,character-encoding,Python,Json,Python 3.x,Character Encoding,这应该是小菜一碟，但我对Python还不熟悉，我似乎不明白应该如何做到这一点：我有一个JSON文件，是通过从Facebook检索我的个人数据得到的，这只是文件的一部分： [ { "timestamp": 1575826804, "attachments": [ ], "data": [ { "post": "This is a test line with character \u00c3\u00ad and \u00c3\u0

这应该是小菜一碟，但我对Python还不熟悉，我似乎不明白应该如何做到这一点：

我有一个JSON文件，是通过从Facebook检索我的个人数据得到的，这只是文件的一部分：

[
  {
    "timestamp": 1575826804,
    "attachments": [

    ],
    "data": [
      {
        "post": "This is a test line with character \u00c3\u00ad and \u00c3\u00b3"
      },
      {
        "update_timestamp": 1575826804
      }
    ],
    "title": "My Name"
  },
  {
    "timestamp": 1575826526,
    "attachments": [

    ],
    "data": [
      {
        "update_timestamp": 1575826526
      }
    ],
    "title": "My Name"
  },
  {
    "timestamp": 1575638718,
    "data": [
      {
        "post": "This is another test line with character \u00c3\u00ad and \u00c3\u00b3 and line breaks\n"
      }
    ],
    "title": "My Name escribi\u00c3\u00b3 en la biograf\u00c3\u00ada de Someone."
  },
  {
    "timestamp": 1575561399,
    "attachments": [
      {
        "data": [
          {
            "external_context": {
              "url": "https://youtu.be/lalalalalalaaeeeE"
            }
          }
        ]
      }
    ],
    "data": [
      {
        "update_timestamp": 1575561399
      }
    ],
    "title": "My Name"
  }
]

该文件有许多unicode代码，如“\u00c3\u00ad”，我需要将其转换为ASCII表示形式。我尝试解析这个JSON文件，并将其作为带有“JSON”库的Python对象加载，首先：

with open("test.json") as fp:
    data = json.load(fp)

    print(type(data))
    print(data[0])

    # output:
    # <class 'list'>
    # {'timestamp': 1575826804, 'attachments': [], 'data': [{'post': 'This is a test line with 
    # character Ã\xad and Ã³'}, {'update_timestamp': 1575826804}], 'title': 'My Name'}

仅当json字符串在json值中不包含任何字符行换行符“\n”或“：”时，此Second尝试才会起作用，但在类似我的情况下，它将抛出：

JSONDecodeError: Invalid control character at: line 33 column 82 (char 560)

字符560是JSON值“post”中的尾随“\n”：

我应该如何正确地用Unicode加载这个JSON？它是否是替代ASCII字符的unicode字符串的唯一方法

提前感谢您的帮助

我认为您需要使用“原始unicode\u escape”

import json
with open("j.json", encoding='raw_unicode_escape') as f:
    data = json.loads(f.read().encode('raw_unicode_escape').decode())
    print(data[0])

OUT: {'timestamp': 1575826804, 'attachments': [], 'data': [{'post': 'This is a test line with character í and ó'}, {'update_timestamp': 1575826804}], 'title': 'My Name'}

这有用吗？

嗯，真奇怪

\u00c3\u00b3

实际上是Unicode中的Ã³。@IvánC.：当解析为UTF8时，它将是字符

ò

。这使它工作起来！现在unicode代码被正确解码了，谢谢！

  {
    "post": "This is another test line with character \u00c3\u00ad and \u00c3\u00b3 and line breaks\n"
  }

import json
with open("j.json", encoding='raw_unicode_escape') as f:
    data = json.loads(f.read().encode('raw_unicode_escape').decode())
    print(data[0])

OUT: {'timestamp': 1575826804, 'attachments': [], 'data': [{'post': 'This is a test line with character í and ó'}, {'update_timestamp': 1575826804}], 'title': 'My Name'}