Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将JSON行解压缩到数据帧_Python_Json_Pandas_Unpack - Fatal编程技术网

Python 将JSON行解压缩到数据帧

Python 将JSON行解压缩到数据帧,python,json,pandas,unpack,Python,Json,Pandas,Unpack,我正在处理JSON行格式,并试图在单个列表中“解包”字典对象。因为它使用一个列表来保存dictionary对象,所以我以前没有找到任何关于这个问题的帖子。数据如下所示,列表对象中有一堆嵌套字典: 0 [{'created_at': 'Sun Jun 14 20:20:28 +0000 202... 1 [{'created_at': 'Sat Jul 25 22:30:14 +0000 202... 2 [{'created_at': 'Sat May

我正在处理JSON行格式,并试图在单个列表中“解包”字典对象。因为它使用一个列表来保存dictionary对象,所以我以前没有找到任何关于这个问题的帖子。数据如下所示,列表对象中有一堆嵌套字典:

0        [{'created_at': 'Sun Jun 14 20:20:28 +0000 202...
1        [{'created_at': 'Sat Jul 25 22:30:14 +0000 202...
2        [{'created_at': 'Sat May 30 02:22:04 +0000 202...
3        [{'created_at': 'Tue May 05 16:54:05 +0000 202...
4        [{'created_at': 'Sat Jun 20 13:50:23 +0000 202...
                               ...                        
17453    [{'created_at': 'Mon Apr 13 01:01:10 +0000 202...
17454    [{'created_at': 'Fri Jul 17 09:00:50 +0000 202...
17455    [{'created_at': 'Sun Jun 21 00:51:54 +0000 202...
17456    [{'created_at': 'Tue Jun 02 18:23:49 +0000 202...
17457    [{'created_at': 'Thu May 28 00:27:01 +0000 202...
我现在尝试的是:

with open('data') as file:
    lines = file.read().splitlines()
df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']
对于嵌套的附加项,我将使用本文提供的
pd.json\u规范化(df\u inter['json\u element'].apply(json.loads))
。但是,如何将多个dictionary对象解压缩到一行中

编辑

由于数据量巨大,我将提供部分单行数据:

[{'created_at': 'Sun Jun 14 20:20:28 +0000 2020', 'id': 1272262651100434433, 'id_str': '1272262651100434433', 'truncated': False, 'display_text_range': [0, 243], 'entities': {'hashtags': [{'text': 'Tenet', 'indices': [82, 88]}], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1272262640753094656, 'id_str': '1272262640753094656', 'indices': [244, 267], 'media_url': 'http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg'...}]

如果您的
数据
文件如下所示:

[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
                       created_at                   id               id_str  truncated display_text_range                                           entities
0  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
1  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
2  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
3  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
4  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
您可以使用以下代码在jsonl文件中每行获取一个dataframe行

导入json
作为pd进口熊猫
以“打开”(“数据”)作为f:
df=pd.DataFrame(对于f中的行,json.load(行)[0]
您的df将如下所示:

[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
[{"created_at": "Sun Jun 14 20:20:28 +0000 2020", "id": 1272262651100434433, "id_str": "1272262651100434433", "truncated": false, "display_text_range": [0, 243], "entities": {"hashtags": [{"text": "Tenet", "indices": [82, 88]}], "symbols": [], "user_mentions": [], "urls": [], "media": [{"id": 1272262640753094656, "id_str": "1272262640753094656", "indices": [244, 267], "media_url": "http://pbs.twimg.com/media/Eaf8IYsWsAAHVHV.jpg"}]}}]
                       created_at                   id               id_str  truncated display_text_range                                           entities
0  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
1  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
2  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
3  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
4  Sun Jun 14 20:20:28 +0000 2020  1272262651100434433  1272262651100434433      False           [0, 243]  {'hashtags': [{'text': 'Tenet', 'indices': [82...
>>df.info()
范围索引:5个条目,0到4
数据列(共6列):
#列非空计数数据类型
---  ------              --------------  -----
0在5个非空对象上创建了\u
1 id 5非空int64
2 id_str 5非空对象
3个截断的5个非空布尔
4显示\u文本\u范围5非空对象
5个实体5个非空对象
数据类型:bool(1)、int64(1)、object(4)
内存使用:333.0+字节

这取决于词典的结构、列表中的每个元素是否具有相同的结构以及有多少嵌套词典。提供更全面的示例数据谢谢,我在帖子中提供了一个示例数据。@Kapocsi你说得对。我已经编辑了这篇文章。谢谢你,先生。这是一段神奇而简单的代码,无需解包列表并将其转换为数据帧。