Python JSON到多个文件路径的数据帧_Python_Json_Pandas_Json Normalize

Python JSON到多个文件路径的数据帧

python json pandas

Python JSON到多个文件路径的数据帧,python,json,pandas,json-normalize,Python,Json,Pandas,Json Normalize,我有一个包含40个客户数据文件的文件夹。每个客户都有一个json文件，其中包含不同的购买内容。示例路径是../customer\u data/customer\u 1/transaction.json 我想将这个json文件加载到一个包含customer_id、date、instore和rewards列的数据框中。客户id是文件夹名称，然后对于instore/rewards中的每一组，我需要一个新行目标：上述文件应如下所示： customer_id| date

我有一个包含40个客户数据文件的文件夹。每个客户都有一个json文件，其中包含不同的购买内容。示例路径是../customer\u data/customer\u 1/transaction.json

我想将这个json文件加载到一个包含customer_id、date、instore和rewards列的数据框中。客户id是文件夹名称，然后对于instore/rewards中的每一组，我需要一个新行

目标：上述文件应如下所示：

   customer_id| date                     | instore          | rewards
   customer_1 |2018-12-21T12:02:42-08:00 |  0               | 0
   customer_1 |2018-12-24T06:19:03-08:00 |98.25211334228516 | 16.764389038085938
   customer_1 |2018-12-24T06:19:03-08:00 |99.88800811767578 | 18.61212158203125

我尝试了以下代码，但得到此错误值错误：元数据名称冲突，需要区分前缀：

# path to file
p = Path('../customer_data/customer_1/transaction.json')

# read json
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# create dataframe
df = json_normalize(data, record_path='purchase', meta=['instore', 'rewards'], errors='ignore')

任何建议都会很有帮助

您可以试试这个，您的json中没有客户id，所以我只是编了一个：

path = '../customer_data/customer_1/transaction.json'
with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data, record_path=['purchase'], meta=[['date'], ['tierLevel']])
df['customer_id'] = path.split('/')[2]
print(df)


     instore    rewards                       date tierLevel customer_id
0  98.252113  16.764389  2018-12-24T06:19:03-08:00         7  customer_1
1  99.888008  18.612122  2018-12-24T06:19:03-08:00         7  customer_1

您可以试试这个，因为您的json中没有客户id，所以我只是编了一个：

path = '../customer_data/customer_1/transaction.json'
with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data, record_path=['purchase'], meta=[['date'], ['tierLevel']])
df['customer_id'] = path.split('/')[2]
print(df)


     instore    rewards                       date tierLevel customer_id
0  98.252113  16.764389  2018-12-24T06:19:03-08:00         7  customer_1
1  99.888008  18.612122  2018-12-24T06:19:03-08:00         7  customer_1

用于查找所有文件。通过在purchase key中填充空列表来修复数据。使用&从路径获取客户id。给定p=Path“../customer\u data/customer\u 1/transaction.json” p、 parent.stem返回“customer_1” 作为pd进口熊猫导入json 从pathlib导入路径文件路径=路径“../customer\u data” files=file_path.rglob'transaction.json' df_list=list 对于文件中的文件：读取json 使用file.open'r'，将='utf-8'编码为f: data=json.loadsf.read 修复列表为空时的购买对于数据中的x：如果不是x['purchase']：检查列表是否为空 x['purchase']=[{'instore'：0，'奖励]：0}] 创建数据帧 df=pd.json_normalizedata，'purchase'，['date'，'tierLevel'] 添加客户 df['customer_id']=file.parent.stem 添加到数据帧列表 df_list.appenddf df=pd.concatdf_列表用于查找所有文件。通过在purchase key中填充空列表来修复数据。使用&从路径获取客户id。给定p=Path“../customer\u data/customer\u 1/transaction.json” p、 parent.stem返回“customer_1” 作为pd进口熊猫导入json 从pathlib导入路径文件路径=路径“../customer\u data” files=file_path.rglob'transaction.json' df_list=list 对于文件中的文件：读取json 使用file.open'r'，将='utf-8'编码为f: data=json.loadsf.read 修复列表为空时的购买对于数据中的x：如果不是x['purchase']：检查列表是否为空 x['purchase']=[{'instore'：0，'奖励]：0}] 创建数据帧 df=pd.json_normalizedata，'purchase'，['date'，'tierLevel'] 添加客户 df['customer_id']=file.parent.stem 添加到数据帧列表 df_list.appenddf df=pd.concatdf_列表

您可以使用我的库anyjsontodf.py

基本上：

import anyjsontodf as jd

df = jd.jsontodf(jsonfile)

Github：

媒体文章：

希望这有帮助

您可以使用我的库anyjsontodf.py

基本上：

import anyjsontodf as jd

df = jd.jsontodf(jsonfile)

Github：

媒体文章：

希望这有帮助

谢谢，客户id仅在文件路径中。有什么方法可以从文件路径本身提取它吗？这不会得到第一个字典，很有趣。不要认为json_normalize支持这一点。@TrentonMcKinney是的，有没有办法也获得空值？谢谢，客户id只在文件路径中。有什么方法可以从文件路径本身提取它吗？这不会得到第一个字典，很有趣。不要认为json_normalize支持这一点。@TrentonMcKinney是的，有没有办法也获得空值？