Python 为什么pandas.read_json会修改长整数的值？_Python_Pandas_Json Normalize

Python 为什么pandas.read_json会修改长整数的值？

python pandas

Python 为什么pandas.read_json会修改长整数的值？,python,pandas,json-normalize,Python,Pandas,Json Normalize,我不知道为什么打印时id_1和id_2的原始内容会发生变化我有一个名为test\u data.json的json文件 { "objects":{ "value":{ "1298543947669573634":{ "timestamp":"Wed Aug 26 08:52:57 +0000 2020", "id_1

我不知道为什么打印时id_1和id_2的原始内容会发生变化

我有一个名为test\u data.json的json文件

{ "objects":{ "value":{ "1298543947669573634":{ "timestamp":"Wed Aug 26 08:52:57 +0000 2020", "id_1":"1298543947669573634", "id_2":"1298519559306190850" } } } }
输出

python test_data.py id_1 id_2 timestamp 0 1298543947669573632 1298519559306190848 2020-08-26 08:52:57+00:00
我的代码名为test\u data.py

import pandas as pd import json file = "test_data.json" with open (file, "r") as f: all_data = json.loads(f.read()) data = pd.read_json(json.dumps(all_data['objects']['value']), orient='index') data = data.reset_index(drop=True) print(data.head())
如何解决此问题，以便正确解释数值？

使用
python 3.8.5
和
1.1.1

当前实施

首先，代码读入文件并将其从
str
类型转换为
dict
，使用
json.loads

打开（文件“r”）作为f： all_data=json.load（f.read（））

然后，
'value'
被转换回
str

json.dumps（所有_数据['objects']['value']）

使用
orient='index'
将
键设置为列标题，值设置为行。此时，数据也会转换为int ，并且值会发生变化我猜在这一步中会出现一些浮点转换问题
pd.read_json（json.dumps（所有_数据['objects']['value']），orient='index'）更新代码选择1 使用dict中的pandas.DataFrame，然后转换为数字 file=“test\u data.json” 打开（文件“r”）作为f： all_data=json.load（f.read（）） #使用 data=pd.DataFrame.from_dict（所有_数据['objects']['value']，orient='index'） #将列转换为数字数据['id_1'，'id_2']]=data['id_1'，'id_2']]。应用（pd.to_numeric，errors='concurve'） data=data.reset_索引（drop=True） #显示（数据）时间戳id_1 id_2 0星期三8月26日08:52:57+0000 2020 129854394769573634 1298519559306190850 打印（data.info（）） [out]：范围索引：1个条目，0到0 数据列（共3列）： #列非空计数数据类型 --- ------ -------------- ----- 0时间戳1非空对象 1 id_1 1非空int64 2 id_2 1非空int64 数据类型：int64（2），对象（1）内存使用：152.0+字节选择2 使用pandas.json\u规范化，然后将列转换为数字 file=“test\u data.json” 打开（文件“r”）作为f： all_data=json.load（f.read（）） #将所有_数据读入数据帧 df=pd.json_规范化（所有_数据['objects']['value']） #重命名列 df.columns=[x.split（'.'）[1]表示df.columns中的x] #转换为数字 df['id_1'，'id_2']]=df['id_1'，'id_2']]。应用（pd.to_numeric，errors='concurve'） #显示（df）时间戳id_1 id_2 0星期三8月26日08:52:57+0000 2020 129854394769573634 1298519559306190850 打印（df.info（） [out]：范围索引：1个条目，0到0 数据列（共3列）： #列非空计数数据类型 --- ------ -------------- ----- 0时间戳1非空对象 1 id_1 1非空int64 2 id_2 1非空int64 数据类型：int64（2），对象（1）内存使用：152.0+字节这是由Pandas的当前1.2.4版本引起的，并且在该版本中仍然存在这是我的变通方法，它比read\u json 对数据的处理速度还要快一些： def broken_load_json(path): """There's an open issue: https://github.com/pandas-dev/pandas/issues/20608 about read_csv loading large integers incorrectly because it's converting from string to float to int, losing precision.""" df = pd.read_json(pathlib.Path(path), orient='index') return df def orjson_load_json(path): import orjson # The builting json module would also work with open(path) as f: d = orjson.loads(f.read()) df = pd.DataFrame.from_dict(d, orient='index') # Builds the index from the dict's keys as strings, sadly # Fix the dtype of the index df = df.reset_index() df['index'] = df['index'].astype('int64') df = df.set_index('index') return df 请注意，我的答案保留了ID的值，这在我的用例中是有意义的。谢谢mate。使用from_dict的选项1工作得非常完美。即使是.apply（pd.to_numeric，errors='concurve'）对于我的用例来说似乎也是可选的。但是我也使用了它。你救了我！使用了python 3.7.4 &0.25.1