Python 如何有效地将维度添加到从复杂字典创建的数据帧中
我认为melt(正如前面讨论的)可能对此有用,但我不太明白如何使用它来解决我的问题 我从这样一本复杂的字典开始:Python 如何有效地将维度添加到从复杂字典创建的数据帧中,python,pandas,Python,Pandas,我认为melt(正如前面讨论的)可能对此有用,但我不太明白如何使用它来解决我的问题 我从这样一本复杂的字典开始: order = [ { "order_id" : 0, "lines" : [ { "line_id" : 1, "line_amount" : 3.45, "line_description" : "first line"
order = [
{
"order_id" : 0,
"lines" : [
{
"line_id" : 1,
"line_amount" : 3.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 6.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 5.43,
"line_description" : "third line"
},
]
},
{
"order_id" : 1,
"lines" : [
...
}
]
我想要一个数据帧,每个订单行一行(不是每个订单一行),它仍然包含原始订单的属性(在本例中只是订单的id)。目前,实现这一目标的最有效方法是:
# Orders DataFrame
odf = pandas.DataFrame(orders)
line_dfs = []
for oid, line_list in odf.iterrows():
line_df = pandas.DataFrame(line_list).copy()
line_df["order_id"] = oid
line_dfs += [ line_df ]
# Line DataFrame
ldf = pandas.concat(line_dfs, sort=False, ignore_index=True).copy()
有没有一种更有效的“矢量化”的方法来实现这一点
ldf = odf.lines.apply(...?...)
感谢您的帮助,包括SO或其他地方的答案链接,这些答案已经解决了这个问题,而我还没有找到。您是否尝试阅读了json
df=pd.read_json(orders)您尝试过read_json吗
df=pd.read_json(orders)使用列表理解和
pop
按键提取行和字典列表,并传递给DataFrame
构造函数:
orders = [
{
"order_id" : 0,
"lines" : [
{
"line_id" : 1,
"line_amount" : 3.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 6.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 5.43,
"line_description" : "third line"
},
]
},
{
"order_id" : 1,
"lines" : [
{
"line_id" : 1,
"line_amount" : 30.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 60.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 50.43,
"line_description" : "third line"
},
]
}
]
另一个带循环的解决方案:
L = []
for x in orders:
for y in x.pop('lines'):
L.append({**x, **y})
odf = pd.DataFrame(L)
使用列表理解和pop
按键提取行和字典列表,并传递到DataFrame
构造函数:
orders = [
{
"order_id" : 0,
"lines" : [
{
"line_id" : 1,
"line_amount" : 3.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 6.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 5.43,
"line_description" : "third line"
},
]
},
{
"order_id" : 1,
"lines" : [
{
"line_id" : 1,
"line_amount" : 30.45,
"line_description" : "first line"
},
{
"line_id" : 2,
"line_amount" : 60.66,
"line_description" : "second line"
},
{
"line_id" : 3,
"line_amount" : 50.43,
"line_description" : "third line"
},
]
}
]
另一个带循环的解决方案:
L = []
for x in orders:
for y in x.pop('lines'):
L.append({**x, **y})
odf = pd.DataFrame(L)
这会给我一个订单的数据框,而我想要一个行的数据框。我可能应该用上面的“blob”来重申这个问题,因为它只是一个dict(这样基本的构造函数就可以使用了)。这将得到一个订单数据框,而我需要一个行数据框。我可能应该用上面的“blob”简单地重复这个问题(这样基本构造函数就可以使用了)。