Python 读取JSON文件并转换为熊猫数据帧
我有一个包含JSON格式数据的文本文件,如下所示Python 读取JSON文件并转换为熊猫数据帧,python,json,pandas,dataframe,Python,Json,Pandas,Dataframe,我有一个包含JSON格式数据的文本文件,如下所示 {"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
我想要的是像这样的熊猫数据帧
accountNumber customerId creditLimit availableMoney transactionDateTime transactionAmount merchantName acqCountry merchantCountryCode posEntryMode ... echoBuffer currentBalance merchantCity merchantState merchantZip cardPresent posOnPremises recurringAuthInd expirationDateKeyInMatch isFraud
0 737265056 737265056 5000 5000.0 2016-08-13T14:27:32 98.55 Uber US US 02 ... NaN 0.0 NaN NaN NaN False NaN NaN False False
当我执行下面的代码时,只需读取JSON并转换为Pandas,就不会得到值;但每个单元格也包括字典。我还尝试了json_规范化,但问题是并非所有列都对齐。任何帮助都将不胜感激
import json
import pandas as pd
with open('test.txt', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
问题是您的“JSON”文件稍微无效。基本上,每个记录都应该用逗号分隔,并且应该像这样包装在一个列表中-
[{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}]
假设该代码段保存在名为test.json
的文件中,下面的代码应该可以工作-
>>> import pandas as pd
>>> df = pd.read_json('test.json')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 29 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 accountNumber 3 non-null int64
1 customerId 3 non-null int64
2 creditLimit 3 non-null int64
3 availableMoney 3 non-null int64
4 transactionDateTime 3 non-null object
5 transactionAmount 3 non-null float64
6 merchantName 3 non-null object
7 acqCountry 3 non-null object
8 merchantCountryCode 3 non-null object
9 posEntryMode 3 non-null int64
10 posConditionCode 3 non-null int64
11 merchantCategoryCode 3 non-null object
12 currentExpDate 3 non-null object
13 accountOpenDate 3 non-null object
14 dateOfLastAddressChange 3 non-null object
15 cardCVV 3 non-null int64
16 enteredCVV 3 non-null int64
17 cardLast4Digits 3 non-null int64
18 transactionType 3 non-null object
19 echoBuffer 3 non-null object
20 currentBalance 3 non-null int64
21 merchantCity 3 non-null object
22 merchantState 3 non-null object
23 merchantZip 3 non-null object
24 cardPresent 3 non-null bool
25 posOnPremises 3 non-null object
26 recurringAuthInd 3 non-null object
27 expirationDateKeyInMatch 3 non-null bool
28 isFraud 3 non-null bool
dtypes: bool(3), float64(1), int64(10), object(15)
memory usage: 761.0+ bytes
>>>
现在您已经有了数据帧,但在…之后一切都一样,问题是您的“JSON”文件稍微无效。基本上,每个记录都应该用逗号分隔,并且应该像这样包装在一个列表中-
[{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}]
假设该代码段保存在名为test.json
的文件中,下面的代码应该可以工作-
>>> import pandas as pd
>>> df = pd.read_json('test.json')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 29 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 accountNumber 3 non-null int64
1 customerId 3 non-null int64
2 creditLimit 3 non-null int64
3 availableMoney 3 non-null int64
4 transactionDateTime 3 non-null object
5 transactionAmount 3 non-null float64
6 merchantName 3 non-null object
7 acqCountry 3 non-null object
8 merchantCountryCode 3 non-null object
9 posEntryMode 3 non-null int64
10 posConditionCode 3 non-null int64
11 merchantCategoryCode 3 non-null object
12 currentExpDate 3 non-null object
13 accountOpenDate 3 non-null object
14 dateOfLastAddressChange 3 non-null object
15 cardCVV 3 non-null int64
16 enteredCVV 3 non-null int64
17 cardLast4Digits 3 non-null int64
18 transactionType 3 non-null object
19 echoBuffer 3 non-null object
20 currentBalance 3 non-null int64
21 merchantCity 3 non-null object
22 merchantState 3 non-null object
23 merchantZip 3 non-null object
24 cardPresent 3 non-null bool
25 posOnPremises 3 non-null object
26 recurringAuthInd 3 non-null object
27 expirationDateKeyInMatch 3 non-null bool
28 isFraud 3 non-null bool
dtypes: bool(3), float64(1), int64(10), object(15)
memory usage: 761.0+ bytes
>>>
现在您已经有了数据帧,在…之后一切都一样。您的示例看起来不像一个有效的JSON文件
data=json.load(f)
给了我一个解析错误。您的示例看起来不是有效的json文件data=json.load(f)
给了我一个解析错误。啊,这很有帮助。现在的问题是如何在每行后面添加逗号,我有近100万条记录。@user14623412我在编辑中已经回答过了。基本上,json。分别加载每个字典,然后创建一个新的数据帧。啊,这很有用。现在的问题是如何在每行后面添加逗号,我有近100万条记录。@user14623412我在编辑中已经回答过了。基本上,json.分别加载每个字典,然后创建一个新的数据帧。