Python 在大熊猫中分解和解包列
假设我有一个如下结构的数据,如何分解包含列表的列,然后解压分解的列 来源:Python 在大熊猫中分解和解包列,python,pandas,unpack,Python,Pandas,Unpack,假设我有一个如下结构的数据,如何分解包含列表的列,然后解压分解的列 来源: d = { "_id" : "5f2", "connId" : 128, "hospitalList" : [ { "hospitalId" : 29, "boardId" : 1019,
d = {
"_id" : "5f2",
"connId" : 128,
"hospitalList" : [
{
"hospitalId" : 29,
"boardId" : 1019,
"siteId" : 1
},
{
"hospitalId" : 3091,
"boardId" : 2163,
"siteId" : 382
},
{
"hospitalId" : 28,
"boardId" : 1017,
"siteId" : 5
}]
}
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
print(output)
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 NaN NaN 3091 2163 382
2 NaN NaN 28 1017 5
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 5f2 128.0 3091 2163 382
2 5f2 128.0 28 1017 5
代码:
d = {
"_id" : "5f2",
"connId" : 128,
"hospitalList" : [
{
"hospitalId" : 29,
"boardId" : 1019,
"siteId" : 1
},
{
"hospitalId" : 3091,
"boardId" : 2163,
"siteId" : 382
},
{
"hospitalId" : 28,
"boardId" : 1017,
"siteId" : 5
}]
}
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
print(output)
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 NaN NaN 3091 2163 382
2 NaN NaN 28 1017 5
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 5f2 128.0 3091 2163 382
2 5f2 128.0 28 1017 5
实际结果:
d = {
"_id" : "5f2",
"connId" : 128,
"hospitalList" : [
{
"hospitalId" : 29,
"boardId" : 1019,
"siteId" : 1
},
{
"hospitalId" : 3091,
"boardId" : 2163,
"siteId" : 382
},
{
"hospitalId" : 28,
"boardId" : 1017,
"siteId" : 5
}]
}
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
print(output)
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 NaN NaN 3091 2163 382
2 NaN NaN 28 1017 5
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 5f2 128.0 3091 2163 382
2 5f2 128.0 28 1017 5
预期结果:
d = {
"_id" : "5f2",
"connId" : 128,
"hospitalList" : [
{
"hospitalId" : 29,
"boardId" : 1019,
"siteId" : 1
},
{
"hospitalId" : 3091,
"boardId" : 2163,
"siteId" : 382
},
{
"hospitalId" : 28,
"boardId" : 1017,
"siteId" : 5
}]
}
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
print(output)
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 NaN NaN 3091 2163 382
2 NaN NaN 28 1017 5
_id connId hospitalId boardId siteId
0 5f2 128.0 29 1019 1
1 5f2 128.0 3091 2163 382
2 5f2 128.0 28 1017 5
这会输出您想要的内容
root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]
l = [root.drop(nested_cols,1),]
for i in nested_cols:
l.append(pd.json_normalize(d, record_path=i))
output = pd.concat(l, axis=1)
output.fillna(method='ffill', inplace=True)
但是,不幸的是,我不知道在什么情况下您将使用代码,和/或您是否必须进行调整。试试这个:
output.fillna(method='ffill',inplace=True)
是否有一种方法可以使其动态化,而无需像我这样明确指定列名?我认为r-初学者的评论解决了您的问题,如果他不打算自己做一个,我会将其插入我的答案中。:)