Pandas 嵌套字典
我在stackoverflow中的第一个问题 我有一个三层嵌套的字典,我想把它转换成df。 字典具有以下结构:Pandas 嵌套字典,pandas,dictionary,multi-index,Pandas,Dictionary,Multi Index,我在stackoverflow中的第一个问题 我有一个三层嵌套的字典,我想把它转换成df。 字典具有以下结构: dictionary = {'CompanyA': {'Revenue': {date1 : $1}, {date2: $2}},... {'ProfitLoss': {date1 : $0}, {date2: $1}}}, 'CompanyB': {'Revenue': {date1 : $1}, {d
dictionary = {'CompanyA': {'Revenue': {date1 : $1}, {date2: $2}},...
{'ProfitLoss': {date1 : $0}, {date2: $1}}},
'CompanyB': {'Revenue': {date1 : $1}, {date2: $2}},...
{'ProfitLoss': {date1 : $0}, {date2: $1}}},
'CompanyC': {'Revenue': {date1 : $1}, {date2: $2}},...
{'ProfitLoss': {date1 : $0}, {date2: $1}}}}
到目前为止,我已经能够使用以下方法构建df:
df = pd.DataFrame.from_dict(dictionary)
但结果是一个df,值作为字典,如下所示:
CompanyA CompanyB CompanyC
Revenue {date1:$0,..} {date1:$1,..} {date1:$0,..}
ProfitLoss{date1:$0,..} {date1:$0,..} {date1:$0,..}
CompanyA CompanyB CompanyC
Revenue Date1 $1 $1 $1
Date2 $2 $2 $2
ProfitLoss Date1 $0 $0 $0
Date2 $1 $1 $1
{
'CompanyA': {
('Revenue', 'date1'): 1,
('ProfitLoss', 'date1'): 0,
}
...
}
import pandas as pd
data = {
'CompanyA': {
'Revenue': {
"date1": 1,
"date2": 2
},
'ProfitLoss': {
"date1": 0,
"date2": 1
}
},
'CompanyB': {
'Revenue': {
"date1": 4,
"date2": 5
},
'ProfitLoss': {
"date1": 2,
"date2": 3
}
}
}
# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
for j in data[i] for k in data[i][j]}
for i in data}, orient="columns")
print(df)
我希望桌子看起来像这样:
CompanyA CompanyB CompanyC
Revenue {date1:$0,..} {date1:$1,..} {date1:$0,..}
ProfitLoss{date1:$0,..} {date1:$0,..} {date1:$0,..}
CompanyA CompanyB CompanyC
Revenue Date1 $1 $1 $1
Date2 $2 $2 $2
ProfitLoss Date1 $0 $0 $0
Date2 $1 $1 $1
{
'CompanyA': {
('Revenue', 'date1'): 1,
('ProfitLoss', 'date1'): 0,
}
...
}
import pandas as pd
data = {
'CompanyA': {
'Revenue': {
"date1": 1,
"date2": 2
},
'ProfitLoss': {
"date1": 0,
"date2": 1
}
},
'CompanyB': {
'Revenue': {
"date1": 4,
"date2": 5
},
'ProfitLoss': {
"date1": 2,
"date2": 3
}
}
}
# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
for j in data[i] for k in data[i][j]}
for i in data}, orient="columns")
print(df)
我尝试使用pd.MultiIndex.from_dict(.from_product)并更改索引,但没有结果。你知道下一步该怎么办吗?任何提示都将不胜感激 我知道你是新来的,但类似的问题可能会有答案,请参见。下次尝试使用关键字查找类似的问题。例如,我通过搜索“pandas nested dict”找到了一个链接,就是这样,第一个链接就是SO post 无论如何,您需要重新调整输入
dict
。您需要这样的dict结构:
CompanyA CompanyB CompanyC
Revenue {date1:$0,..} {date1:$1,..} {date1:$0,..}
ProfitLoss{date1:$0,..} {date1:$0,..} {date1:$0,..}
CompanyA CompanyB CompanyC
Revenue Date1 $1 $1 $1
Date2 $2 $2 $2
ProfitLoss Date1 $0 $0 $0
Date2 $1 $1 $1
{
'CompanyA': {
('Revenue', 'date1'): 1,
('ProfitLoss', 'date1'): 0,
}
...
}
import pandas as pd
data = {
'CompanyA': {
'Revenue': {
"date1": 1,
"date2": 2
},
'ProfitLoss': {
"date1": 0,
"date2": 1
}
},
'CompanyB': {
'Revenue': {
"date1": 4,
"date2": 5
},
'ProfitLoss': {
"date1": 2,
"date2": 3
}
}
}
# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
for j in data[i] for k in data[i][j]}
for i in data}, orient="columns")
print(df)
我会这样做:
CompanyA CompanyB CompanyC
Revenue {date1:$0,..} {date1:$1,..} {date1:$0,..}
ProfitLoss{date1:$0,..} {date1:$0,..} {date1:$0,..}
CompanyA CompanyB CompanyC
Revenue Date1 $1 $1 $1
Date2 $2 $2 $2
ProfitLoss Date1 $0 $0 $0
Date2 $1 $1 $1
{
'CompanyA': {
('Revenue', 'date1'): 1,
('ProfitLoss', 'date1'): 0,
}
...
}
import pandas as pd
data = {
'CompanyA': {
'Revenue': {
"date1": 1,
"date2": 2
},
'ProfitLoss': {
"date1": 0,
"date2": 1
}
},
'CompanyB': {
'Revenue': {
"date1": 4,
"date2": 5
},
'ProfitLoss': {
"date1": 2,
"date2": 3
}
}
}
# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
for j in data[i] for k in data[i][j]}
for i in data}, orient="columns")
print(df)
输出:
CompanyA CompanyB
ProfitLoss date1 0 2
date2 1 3
Revenue date1 1 4
date2 2 5
CompanyA CompanyB
ProfitLoss 2018-10-08 11:19:09.006375 0 2
2019-10-08 11:19:09.006375 1 3
Revenue 2018-10-08 11:19:09.006375 1 4
2019-10-08 11:19:09.006375 2 5
编辑
使用实际日期时间回复您的评论:
import pandas as pd
import datetime as dt
date1 = dt.datetime.now()
date2 = date1 + dt.timedelta(days=365)
data = {
'CompanyA': {
'Revenue': {
date1: 1,
date2: 2
},
'ProfitLoss': {
date1: 0,
date2: 1
}
},
'CompanyB': {
'Revenue': {
date1: 4,
date2: 5
},
'ProfitLoss': {
date1: 2,
date2: 3
}
}
}
# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
for j in data[i] for k in data[i][j]}
for i in data}, orient="columns")
print(df)
输出:
CompanyA CompanyB
ProfitLoss date1 0 2
date2 1 3
Revenue date1 1 4
date2 2 5
CompanyA CompanyB
ProfitLoss 2018-10-08 11:19:09.006375 0 2
2019-10-08 11:19:09.006375 1 3
Revenue 2018-10-08 11:19:09.006375 1 4
2019-10-08 11:19:09.006375 2 5
可能是埃德加的复制品,谢谢你花时间回答我的问题。按照您的示例,我得到了与使用Pandas MultiIndex之前相同的结果。在这两种情况下,我都得到了正确的表格式,但表中没有数据。日期格式(datetime.date(2018,3,31):1.0)是否与表格填充错误有关?@edmond dantes,检查字典的结构。我认为
dictionary={'CompanyA':{'Revenue':{date1:$1},{date2:$2},{/code>应该是dictionary={'CompanyA':{'Revenue':{date1:$1,date2:$2},{/code>。原始字典的结构没关系,我错放了一个{}很抱歉,继续下去,看起来GITHUB上有一个bug。这正是我的问题:[我会考虑我的问题!