Python 读取JSON文件并转换为熊猫数据帧

Python 读取JSON文件并转换为熊猫数据帧,python,json,pandas,dataframe,Python,Json,Pandas,Dataframe,我有一个包含JSON格式数据的文本文件,如下所示 {"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32

我有一个包含JSON格式数据的文本文件,如下所示

{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}
我想要的是像这样的熊猫数据帧

 accountNumber  customerId  creditLimit availableMoney  transactionDateTime transactionAmount   merchantName    acqCountry  merchantCountryCode posEntryMode    ... echoBuffer  currentBalance  merchantCity    merchantState   merchantZip cardPresent posOnPremises   recurringAuthInd    expirationDateKeyInMatch    isFraud
0   737265056   737265056   5000    5000.0  2016-08-13T14:27:32 98.55   Uber    US  US  02  ... NaN 0.0 NaN NaN NaN False   NaN NaN False   False
当我执行下面的代码时,只需读取JSON并转换为Pandas,就不会得到值;但每个单元格也包括字典。我还尝试了json_规范化,但问题是并非所有列都对齐。任何帮助都将不胜感激

import json
import pandas as pd
with open('test.txt', 'r') as f:
    data = json.load(f)
df = pd.DataFrame(data)

问题是您的“JSON”文件稍微无效。基本上,每个记录都应该用逗号分隔,并且应该像这样包装在一个列表中-

[{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}]
假设该代码段保存在名为
test.json
的文件中,下面的代码应该可以工作-

>>> import pandas as pd
>>> df = pd.read_json('test.json')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 29 columns):
 #   Column                    Non-Null Count  Dtype
---  ------                    --------------  -----
 0   accountNumber             3 non-null      int64
 1   customerId                3 non-null      int64
 2   creditLimit               3 non-null      int64
 3   availableMoney            3 non-null      int64
 4   transactionDateTime       3 non-null      object
 5   transactionAmount         3 non-null      float64
 6   merchantName              3 non-null      object
 7   acqCountry                3 non-null      object
 8   merchantCountryCode       3 non-null      object
 9   posEntryMode              3 non-null      int64
 10  posConditionCode          3 non-null      int64
 11  merchantCategoryCode      3 non-null      object
 12  currentExpDate            3 non-null      object
 13  accountOpenDate           3 non-null      object
 14  dateOfLastAddressChange   3 non-null      object
 15  cardCVV                   3 non-null      int64
 16  enteredCVV                3 non-null      int64
 17  cardLast4Digits           3 non-null      int64
 18  transactionType           3 non-null      object
 19  echoBuffer                3 non-null      object
 20  currentBalance            3 non-null      int64
 21  merchantCity              3 non-null      object
 22  merchantState             3 non-null      object
 23  merchantZip               3 non-null      object
 24  cardPresent               3 non-null      bool
 25  posOnPremises             3 non-null      object
 26  recurringAuthInd          3 non-null      object
 27  expirationDateKeyInMatch  3 non-null      bool
 28  isFraud                   3 non-null      bool
dtypes: bool(3), float64(1), int64(10), object(15)
memory usage: 761.0+ bytes
>>>

现在您已经有了数据帧,但在…

之后一切都一样,问题是您的“JSON”文件稍微无效。基本上,每个记录都应该用逗号分隔,并且应该像这样包装在一个列表中-

[{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-08-13T14:27:32", "transactionAmount": 98.55, "merchantName": "Uber", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "02", "posConditionCode": "01", "merchantCategoryCode": "rideshare", "currentExpDate": "06/2023", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "414", "enteredCVV": "414", "cardLast4Digits": "1803", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-10-11T05:05:54", "transactionAmount": 74.51, "merchantName": "AMC #191138", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "entertainment", "cardPresent": true, "currentExpDate": "02/2024", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false},
{"accountNumber": "737265056", "customerId": "737265056", "creditLimit": 5000.0, "availableMoney": 5000.0, "transactionDateTime": "2016-11-08T09:18:39", "transactionAmount": 7.47, "merchantName": "Play Store", "acqCountry": "US", "merchantCountryCode": "US", "posEntryMode": "09", "posConditionCode": "01", "merchantCategoryCode": "mobileapps", "currentExpDate": "08/2025", "accountOpenDate": "2015-03-14", "dateOfLastAddressChange": "2015-03-14", "cardCVV": "486", "enteredCVV": "486", "cardLast4Digits": "767", "transactionType": "PURCHASE", "echoBuffer": "", "currentBalance": 0.0, "merchantCity": "", "merchantState": "", "merchantZip": "", "cardPresent": false, "posOnPremises": "", "recurringAuthInd": "", "expirationDateKeyInMatch": false, "isFraud": false}]
假设该代码段保存在名为
test.json
的文件中,下面的代码应该可以工作-

>>> import pandas as pd
>>> df = pd.read_json('test.json')
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 29 columns):
 #   Column                    Non-Null Count  Dtype
---  ------                    --------------  -----
 0   accountNumber             3 non-null      int64
 1   customerId                3 non-null      int64
 2   creditLimit               3 non-null      int64
 3   availableMoney            3 non-null      int64
 4   transactionDateTime       3 non-null      object
 5   transactionAmount         3 non-null      float64
 6   merchantName              3 non-null      object
 7   acqCountry                3 non-null      object
 8   merchantCountryCode       3 non-null      object
 9   posEntryMode              3 non-null      int64
 10  posConditionCode          3 non-null      int64
 11  merchantCategoryCode      3 non-null      object
 12  currentExpDate            3 non-null      object
 13  accountOpenDate           3 non-null      object
 14  dateOfLastAddressChange   3 non-null      object
 15  cardCVV                   3 non-null      int64
 16  enteredCVV                3 non-null      int64
 17  cardLast4Digits           3 non-null      int64
 18  transactionType           3 non-null      object
 19  echoBuffer                3 non-null      object
 20  currentBalance            3 non-null      int64
 21  merchantCity              3 non-null      object
 22  merchantState             3 non-null      object
 23  merchantZip               3 non-null      object
 24  cardPresent               3 non-null      bool
 25  posOnPremises             3 non-null      object
 26  recurringAuthInd          3 non-null      object
 27  expirationDateKeyInMatch  3 non-null      bool
 28  isFraud                   3 non-null      bool
dtypes: bool(3), float64(1), int64(10), object(15)
memory usage: 761.0+ bytes
>>>

现在您已经有了数据帧,在…之后一切都一样。

您的示例看起来不像一个有效的JSON文件
data=json.load(f)
给了我一个解析错误。您的示例看起来不是有效的json文件
data=json.load(f)
给了我一个解析错误。啊,这很有帮助。现在的问题是如何在每行后面添加逗号,我有近100万条记录。@user14623412我在编辑中已经回答过了。基本上,
json。分别加载每个字典,然后创建一个新的数据帧。啊,这很有用。现在的问题是如何在每行后面添加逗号,我有近100万条记录。@user14623412我在编辑中已经回答过了。基本上,
json.分别加载每个字典,然后创建一个新的数据帧。