Python 使用两个值变量熔化数据帧_Python

Python 使用两个值变量熔化数据帧

python

Python 使用两个值变量熔化数据帧,python,Python,我有一个跨多个商店和地区的库存和采购数据框架。我正在尝试使用melt来堆叠数据帧，但我需要有两个值列，inventory和purchases，并且不知道如何做到这一点。数据帧如下所示： Region | Store | Inventory_Item_1 | Inventory_Item_2 | Purchase_Item_1 | Purchase_Item_2 ------------------------------------------------------

我有一个跨多个商店和地区的库存和采购数据框架。我正在尝试使用melt来堆叠数据帧，但我需要有两个值列，inventory和purchases，并且不知道如何做到这一点。数据帧如下所示：

Region   |   Store   |  Inventory_Item_1   |  Inventory_Item_2  |  Purchase_Item_1  |  Purchase_Item_2
------------------------------------------------------------------------------------------------------       
 North         A             15                    20                 5                     6
 North         B             20                    25                 7                     8
 North         C             18                    22                 6                     10
 South         D             10                    15                 9                     7
 South         E             12                    12                 10                    8

  Region   |   Store   |      Item              |  Inventory   |   Purchases      
 -----------------------------------------------------------------------------
   North        A         Inventory_Item_1             15             5
   North        A         Inventory_Item_2             20             6
   North        B         Inventory_Item_1             20             7
   North        B         Inventory_Item_2             25             8    
   North        C         Inventory_Item_1             18             6
   North        C         Inventory_Item_2             22             10
   South        D         Inventory_Item_1             10             9
   South        D         Inventory_Item_2             15             7
   South        E         Inventory_Item_1             12             10
   South        E         Inventory_Item_2             12             8

我试图将数据帧转换为以下格式：

Region   |   Store   |  Inventory_Item_1   |  Inventory_Item_2  |  Purchase_Item_1  |  Purchase_Item_2
------------------------------------------------------------------------------------------------------       
 North         A             15                    20                 5                     6
 North         B             20                    25                 7                     8
 North         C             18                    22                 6                     10
 South         D             10                    15                 9                     7
 South         E             12                    12                 10                    8

  Region   |   Store   |      Item              |  Inventory   |   Purchases      
 -----------------------------------------------------------------------------
   North        A         Inventory_Item_1             15             5
   North        A         Inventory_Item_2             20             6
   North        B         Inventory_Item_1             20             7
   North        B         Inventory_Item_2             25             8    
   North        C         Inventory_Item_1             18             6
   North        C         Inventory_Item_2             22             10
   South        D         Inventory_Item_1             10             9
   South        D         Inventory_Item_2             15             7
   South        E         Inventory_Item_1             12             10
   South        E         Inventory_Item_2             12             8

这是我写的，但我不知道如何为库存和采购创建列。请注意，我的完整数据帧要大得多（50多个区域、140多个存储区、15多个项目）

任何帮助或建议都将不胜感激

您可以通过以下步骤到达：

# please always provide minimal working code - we as helpers and answerers 
# otherwise have to invest extra time to generate beginning working code
# and that is unfair - we already spend enough time to solve the problem:
df = pd.DataFrame([
["North","A",15,20,5,6],
["North","B",20,25,7,8],
["North","C",18,22,6,10],
["South","D",10,15,9,7],
["South","E",12,12,10,8]], columns=["Region","Store","Inventory_Item_1","Inventory_Item_2","Purchase_Item_1","Purchase_Item_2"])

# melt the dataframe completely first
df_final = pd.melt(df, id_vars=['Region', 'Store'], value_vars=['Inventory_Item_1', 'Inventory_Item_2', 'Purchase_Item_1', 'Purchase_Item_2'])

# extract inventory and purchase sub data frames
# they have in common the "variable" column (the item number!)
# so let it look exactly the same in both data frames by removing
# unnecessary parts
df_inventory = df_final.loc[[x.startswith("Inventory") for x in df_final.variable],:]
df_inventory.variable = [s.replace("Inventory_", "") for s in df_inventory.variable]
df_purchase = df_final.loc[[x.startswith("Purchase") for x in df_final.variable],:]
df_purchase.variable = [s.replace("Purchase_", "") for s in df_purchase.variable]

# deepcopy the data frames (just to keep old results so that you can inspect them)
df_purchase_ = df_purchase.copy()
df_inventory_ = df_inventory.copy()

# rename the columns to prepare for merging
df_inventory_.columns = ["Region", "Store", "variable", "Inventory"]
df_purchase_.columns = ["Region", "Store", "variable", "Purchase"]

# merge by the three common columns
df_final_1 = pd.merge(df_inventory_, df_purchase_, how="left", left_on=["Region", "Store", "variable"], right_on=["Region", "Store", "variable"])

# sort by the three common columns
df_final_1.sort_values(by=["Region", "Store", "variable"], axis=0)

这是回报

  Region Store variable  Inventory  Purchase
0  North     A   Item_1         15         5
5  North     A   Item_2         20         6
1  North     B   Item_1         20         7
6  North     B   Item_2         25         8
2  North     C   Item_1         18         6
7  North     C   Item_2         22        10
3  South     D   Item_1         10         9
8  South     D   Item_2         15         7
4  South     E   Item_1         12        10
9  South     E   Item_2         12         8

您可以通过以下步骤到达：

# please always provide minimal working code - we as helpers and answerers 
# otherwise have to invest extra time to generate beginning working code
# and that is unfair - we already spend enough time to solve the problem:
df = pd.DataFrame([
["North","A",15,20,5,6],
["North","B",20,25,7,8],
["North","C",18,22,6,10],
["South","D",10,15,9,7],
["South","E",12,12,10,8]], columns=["Region","Store","Inventory_Item_1","Inventory_Item_2","Purchase_Item_1","Purchase_Item_2"])

# melt the dataframe completely first
df_final = pd.melt(df, id_vars=['Region', 'Store'], value_vars=['Inventory_Item_1', 'Inventory_Item_2', 'Purchase_Item_1', 'Purchase_Item_2'])

# extract inventory and purchase sub data frames
# they have in common the "variable" column (the item number!)
# so let it look exactly the same in both data frames by removing
# unnecessary parts
df_inventory = df_final.loc[[x.startswith("Inventory") for x in df_final.variable],:]
df_inventory.variable = [s.replace("Inventory_", "") for s in df_inventory.variable]
df_purchase = df_final.loc[[x.startswith("Purchase") for x in df_final.variable],:]
df_purchase.variable = [s.replace("Purchase_", "") for s in df_purchase.variable]

# deepcopy the data frames (just to keep old results so that you can inspect them)
df_purchase_ = df_purchase.copy()
df_inventory_ = df_inventory.copy()

# rename the columns to prepare for merging
df_inventory_.columns = ["Region", "Store", "variable", "Inventory"]
df_purchase_.columns = ["Region", "Store", "variable", "Purchase"]

# merge by the three common columns
df_final_1 = pd.merge(df_inventory_, df_purchase_, how="left", left_on=["Region", "Store", "variable"], right_on=["Region", "Store", "variable"])

# sort by the three common columns
df_final_1.sort_values(by=["Region", "Store", "variable"], axis=0)

这是回报

  Region Store variable  Inventory  Purchase
0  North     A   Item_1         15         5
5  North     A   Item_2         20         6
1  North     B   Item_1         20         7
6  North     B   Item_2         25         8
2  North     C   Item_1         18         6
7  North     C   Item_2         22        10
3  South     D   Item_1         10         9
8  South     D   Item_2         15         7
4  South     E   Item_1         12        10
9  South     E   Item_2         12         8

我将使用行和列上的分层索引来完成这些操作

对于行，您可以非常轻松地

设置_索引（['Region'，'Store']）

不过，你必须对专栏有点小技巧。由于您需要访问通过在区域和存储上设置索引而产生的非索引列，因此需要将其

导入到一个自定义函数，该函数将构建所需的元组并创建名称多级列索引
之后，您可以将列堆叠到行索引中，并可以选择重置整行索引，使所有内容再次成为普通列
df=pd.DataFrame({
‘地区’：[‘北’、‘北’、‘北’、‘南’、‘南’]，
'商店'：['A'，'B'，'C'，'D'，'E']，
“库存项目1”：[15,20,18,10,12]，
“库存项目2”：[20,25,22,15,12]，
“采购项目1”：[5,7,6,9,10]，
“购买项目2”：[6,8,10,7,8]
})
输出=(
设置索引（['Region'，'Store']）
.管道（λdf:
df.set_轴（df.columns.str.split（“”，n=1，expand=True），axis='columns'）
)
.rename_axis（['Status'，'Product']，axis='columns'））
.stack（level='Product'）
.reset_index（）
)

这给了我：
Region Store Product  Inventory  Purchase
 North     A  Item_1         15         5
 North     A  Item_2         20         6
 North     B  Item_1         20         7
 North     B  Item_2         25         8
 North     C  Item_1         18         6
 North     C  Item_2         22        10
 South     D  Item_1         10         9
 South     D  Item_2         15         7
 South     E  Item_1         12        10
 South     E  Item_2         12         8

我将使用行和列上的分层索引来完成这些操作
对于行，您可以非常轻松地设置_索引（['Region'，'Store']）

不过，你必须对专栏有点小技巧。由于您需要访问通过在区域和存储上设置索引而产生的非索引列，因此需要将其导入到一个自定义函数，该函数将构建所需的元组并创建名称多级列索引
之后，您可以将列堆叠到行索引中，并可以选择重置整行索引，使所有内容再次成为普通列
df=pd.DataFrame({
‘地区’：[‘北’、‘北’、‘北’、‘南’、‘南’]，
'商店'：['A'，'B'，'C'，'D'，'E']，
“库存项目1”：[15,20,18,10,12]，
“库存项目2”：[20,25,22,15,12]，
“采购项目1”：[5,7,6,9,10]，
“购买项目2”：[6,8,10,7,8]
})
输出=(
设置索引（['Region'，'Store']）
.管道（λdf:
df.set_轴（df.columns.str.split（“”，n=1，expand=True），axis='columns'）
)
.rename_axis（['Status'，'Product']，axis='columns'））
.stack（level='Product'）
.reset_index（）
)

这给了我：
Region Store Product  Inventory  Purchase
 North     A  Item_1         15         5
 North     A  Item_2         20         6
 North     B  Item_1         20         7
 North     B  Item_2         25         8
 North     C  Item_1         18         6
 North     C  Item_2         22        10
 South     D  Item_1         10         9
 South     D  Item_2         15         7
 South     E  Item_1         12        10
 South     E  Item_2         12         8

您可以从中使用该功能；目前，您必须从以下位置安装最新的开发版本：
它通过将包含组的正则表达式传递给names\u pattern
参数来工作。names\u to
中的“.value”可确保库存
和采购
作为列标题保存，而另一组（项目_1
和项目_2
）被整理成一个新组项目
您可以从中使用该功能；目前，您必须从以下位置安装最新的开发版本：
它通过将包含组的正则表达式传递给names\u pattern
参数来工作。names\u to
中的“.value”可确保库存
和采购
作为列标题保存，而另一组（项目1
和项目2
）则被整理成一个新组项目
，
，因此您的目的是为了丢失采购信息？（关于购买项目1和购买项目2的信息丢失）@Gwang JinKim购买项目1和购买项目2只是购买项目1和2。该数据在“采购”列中。这实际上是要点-不应将其命名为“库存\项目\ 1”…-但只是“第1项”、“第2项”。。。否则会很混乱-看到我的解决方案了吗？你的目的是为了散播购买信息？（关于购买项目1和购买项目2的信息丢失）@Gwang JinKim购买项目1和购买项目2只是购买项目1和2。该数据在“采购”列中。这实际上是要点-不应将其命名为“库存\项目\ 1”…-但只是“第1项”、“第2项”。。。否则会非常混乱-请参阅我的解决方案