Python 使用两个值变量熔化数据帧
我有一个跨多个商店和地区的库存和采购数据框架。我正在尝试使用melt来堆叠数据帧,但我需要有两个值列,inventory和purchases,并且不知道如何做到这一点。数据帧如下所示:Python 使用两个值变量熔化数据帧,python,Python,我有一个跨多个商店和地区的库存和采购数据框架。我正在尝试使用melt来堆叠数据帧,但我需要有两个值列,inventory和purchases,并且不知道如何做到这一点。数据帧如下所示: Region | Store | Inventory_Item_1 | Inventory_Item_2 | Purchase_Item_1 | Purchase_Item_2 ------------------------------------------------------
Region | Store | Inventory_Item_1 | Inventory_Item_2 | Purchase_Item_1 | Purchase_Item_2
------------------------------------------------------------------------------------------------------
North A 15 20 5 6
North B 20 25 7 8
North C 18 22 6 10
South D 10 15 9 7
South E 12 12 10 8
Region | Store | Item | Inventory | Purchases
-----------------------------------------------------------------------------
North A Inventory_Item_1 15 5
North A Inventory_Item_2 20 6
North B Inventory_Item_1 20 7
North B Inventory_Item_2 25 8
North C Inventory_Item_1 18 6
North C Inventory_Item_2 22 10
South D Inventory_Item_1 10 9
South D Inventory_Item_2 15 7
South E Inventory_Item_1 12 10
South E Inventory_Item_2 12 8
我试图将数据帧转换为以下格式:
Region | Store | Inventory_Item_1 | Inventory_Item_2 | Purchase_Item_1 | Purchase_Item_2
------------------------------------------------------------------------------------------------------
North A 15 20 5 6
North B 20 25 7 8
North C 18 22 6 10
South D 10 15 9 7
South E 12 12 10 8
Region | Store | Item | Inventory | Purchases
-----------------------------------------------------------------------------
North A Inventory_Item_1 15 5
North A Inventory_Item_2 20 6
North B Inventory_Item_1 20 7
North B Inventory_Item_2 25 8
North C Inventory_Item_1 18 6
North C Inventory_Item_2 22 10
South D Inventory_Item_1 10 9
South D Inventory_Item_2 15 7
South E Inventory_Item_1 12 10
South E Inventory_Item_2 12 8
这是我写的,但我不知道如何为库存和采购创建列。请注意,我的完整数据帧要大得多(50多个区域、140多个存储区、15多个项目)
任何帮助或建议都将不胜感激 您可以通过以下步骤到达:
# please always provide minimal working code - we as helpers and answerers
# otherwise have to invest extra time to generate beginning working code
# and that is unfair - we already spend enough time to solve the problem:
df = pd.DataFrame([
["North","A",15,20,5,6],
["North","B",20,25,7,8],
["North","C",18,22,6,10],
["South","D",10,15,9,7],
["South","E",12,12,10,8]], columns=["Region","Store","Inventory_Item_1","Inventory_Item_2","Purchase_Item_1","Purchase_Item_2"])
# melt the dataframe completely first
df_final = pd.melt(df, id_vars=['Region', 'Store'], value_vars=['Inventory_Item_1', 'Inventory_Item_2', 'Purchase_Item_1', 'Purchase_Item_2'])
# extract inventory and purchase sub data frames
# they have in common the "variable" column (the item number!)
# so let it look exactly the same in both data frames by removing
# unnecessary parts
df_inventory = df_final.loc[[x.startswith("Inventory") for x in df_final.variable],:]
df_inventory.variable = [s.replace("Inventory_", "") for s in df_inventory.variable]
df_purchase = df_final.loc[[x.startswith("Purchase") for x in df_final.variable],:]
df_purchase.variable = [s.replace("Purchase_", "") for s in df_purchase.variable]
# deepcopy the data frames (just to keep old results so that you can inspect them)
df_purchase_ = df_purchase.copy()
df_inventory_ = df_inventory.copy()
# rename the columns to prepare for merging
df_inventory_.columns = ["Region", "Store", "variable", "Inventory"]
df_purchase_.columns = ["Region", "Store", "variable", "Purchase"]
# merge by the three common columns
df_final_1 = pd.merge(df_inventory_, df_purchase_, how="left", left_on=["Region", "Store", "variable"], right_on=["Region", "Store", "variable"])
# sort by the three common columns
df_final_1.sort_values(by=["Region", "Store", "variable"], axis=0)
这是回报
Region Store variable Inventory Purchase
0 North A Item_1 15 5
5 North A Item_2 20 6
1 North B Item_1 20 7
6 North B Item_2 25 8
2 North C Item_1 18 6
7 North C Item_2 22 10
3 South D Item_1 10 9
8 South D Item_2 15 7
4 South E Item_1 12 10
9 South E Item_2 12 8
您可以通过以下步骤到达:
# please always provide minimal working code - we as helpers and answerers
# otherwise have to invest extra time to generate beginning working code
# and that is unfair - we already spend enough time to solve the problem:
df = pd.DataFrame([
["North","A",15,20,5,6],
["North","B",20,25,7,8],
["North","C",18,22,6,10],
["South","D",10,15,9,7],
["South","E",12,12,10,8]], columns=["Region","Store","Inventory_Item_1","Inventory_Item_2","Purchase_Item_1","Purchase_Item_2"])
# melt the dataframe completely first
df_final = pd.melt(df, id_vars=['Region', 'Store'], value_vars=['Inventory_Item_1', 'Inventory_Item_2', 'Purchase_Item_1', 'Purchase_Item_2'])
# extract inventory and purchase sub data frames
# they have in common the "variable" column (the item number!)
# so let it look exactly the same in both data frames by removing
# unnecessary parts
df_inventory = df_final.loc[[x.startswith("Inventory") for x in df_final.variable],:]
df_inventory.variable = [s.replace("Inventory_", "") for s in df_inventory.variable]
df_purchase = df_final.loc[[x.startswith("Purchase") for x in df_final.variable],:]
df_purchase.variable = [s.replace("Purchase_", "") for s in df_purchase.variable]
# deepcopy the data frames (just to keep old results so that you can inspect them)
df_purchase_ = df_purchase.copy()
df_inventory_ = df_inventory.copy()
# rename the columns to prepare for merging
df_inventory_.columns = ["Region", "Store", "variable", "Inventory"]
df_purchase_.columns = ["Region", "Store", "variable", "Purchase"]
# merge by the three common columns
df_final_1 = pd.merge(df_inventory_, df_purchase_, how="left", left_on=["Region", "Store", "variable"], right_on=["Region", "Store", "variable"])
# sort by the three common columns
df_final_1.sort_values(by=["Region", "Store", "variable"], axis=0)
这是回报
Region Store variable Inventory Purchase
0 North A Item_1 15 5
5 North A Item_2 20 6
1 North B Item_1 20 7
6 North B Item_2 25 8
2 North C Item_1 18 6
7 North C Item_2 22 10
3 South D Item_1 10 9
8 South D Item_2 15 7
4 South E Item_1 12 10
9 South E Item_2 12 8
我将使用行和列上的分层索引来完成这些操作 对于行,您可以非常轻松地
设置_索引(['Region','Store'])
不过,你必须对专栏有点小技巧。由于您需要访问通过在区域和存储上设置索引而产生的非索引列,因此需要将其导入到一个自定义函数,该函数将构建所需的元组并创建名称多级列索引
之后,您可以将列堆叠到行索引中,并可以选择重置整行索引,使所有内容再次成为普通列
df=pd.DataFrame({
‘地区’:[‘北’、‘北’、‘北’、‘南’、‘南’],
'商店':['A','B','C','D','E'],
“库存项目1”:[15,20,18,10,12],
“库存项目2”:[20,25,22,15,12],
“采购项目1”:[5,7,6,9,10],
“购买项目2”:[6,8,10,7,8]
})
输出=(
设置索引(['Region','Store'])
.管道(λdf:
df.set_轴(df.columns.str.split(“”,n=1,expand=True),axis='columns')
)
.rename_axis(['Status','Product'],axis='columns'))
.stack(level='Product')
.reset_index()
)
这给了我:
Region Store Product Inventory Purchase
North A Item_1 15 5
North A Item_2 20 6
North B Item_1 20 7
North B Item_2 25 8
North C Item_1 18 6
North C Item_2 22 10
South D Item_1 10 9
South D Item_2 15 7
South E Item_1 12 10
South E Item_2 12 8
我将使用行和列上的分层索引来完成这些操作
对于行,您可以非常轻松地设置_索引(['Region','Store'])
不过,你必须对专栏有点小技巧。由于您需要访问通过在区域和存储上设置索引而产生的非索引列,因此需要将其导入到一个自定义函数,该函数将构建所需的元组并创建名称多级列索引
之后,您可以将列堆叠到行索引中,并可以选择重置整行索引,使所有内容再次成为普通列
df=pd.DataFrame({
‘地区’:[‘北’、‘北’、‘北’、‘南’、‘南’],
'商店':['A','B','C','D','E'],
“库存项目1”:[15,20,18,10,12],
“库存项目2”:[20,25,22,15,12],
“采购项目1”:[5,7,6,9,10],
“购买项目2”:[6,8,10,7,8]
})
输出=(
设置索引(['Region','Store'])
.管道(λdf:
df.set_轴(df.columns.str.split(“”,n=1,expand=True),axis='columns')
)
.rename_axis(['Status','Product'],axis='columns'))
.stack(level='Product')
.reset_index()
)
这给了我:
Region Store Product Inventory Purchase
North A Item_1 15 5
North A Item_2 20 6
North B Item_1 20 7
North B Item_2 25 8
North C Item_1 18 6
North C Item_2 22 10
South D Item_1 10 9
South D Item_2 15 7
South E Item_1 12 10
South E Item_2 12 8
您可以从中使用该功能;目前,您必须从以下位置安装最新的开发版本:
它通过将包含组的正则表达式传递给names\u pattern
参数来工作。names\u to
中的“.value”可确保库存
和采购
作为列标题保存,而另一组(项目_1
和项目_2
)被整理成一个新组项目
您可以从中使用该功能;目前,您必须从以下位置安装最新的开发版本:
它通过将包含组的正则表达式传递给names\u pattern
参数来工作。names\u to
中的“.value”可确保库存
和采购
作为列标题保存,而另一组(项目1
和项目2
)则被整理成一个新组项目
,,因此您的目的是为了丢失采购信息?(关于购买项目1和购买项目2的信息丢失)@Gwang JinKim购买项目1和购买项目2只是购买项目1和2。该数据在“采购”列中。这实际上是要点-不应将其命名为“库存\项目\ 1”…-但只是“第1项”、“第2项”。。。否则会很混乱-看到我的解决方案了吗?你的目的是为了散播购买信息?(关于购买项目1和购买项目2的信息丢失)@Gwang JinKim购买项目1和购买项目2只是购买项目1和2。该数据在“采购”列中。这实际上是要点-不应将其命名为“库存\项目\ 1”…-但只是“第1项”、“第2项”。。。否则会非常混乱-请参阅我的解决方案