Python 熊猫：创建一个带有“；虚拟变量”；另一张桌子的桌子_Python_Pandas_Dataframe

Python 熊猫：创建一个带有“；虚拟变量”；另一张桌子的桌子

python pandas dataframe

Python 熊猫：创建一个带有“；虚拟变量”；另一张桌子的桌子,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有这个数据帧数据帧A（产品）数据帧B（操作）想法是从“dataframe B”获得一个“dataframe C”和列乘积：数据帧C结果 id | codoper | Product_18| Product_22| Product_33| Product_55| Product_67 |valor ---------------------------------------------------------------------------------- 1 | 00001

假设我有这个数据帧

数据帧A（产品）

数据帧B（操作）

想法是从“dataframe B”获得一个“dataframe C”和列乘积：

数据帧C结果

id | codoper | Product_18| Product_22| Product_33| Product_55| Product_67 |valor
----------------------------------------------------------------------------------
1  | 00001   | 1         | 0         | 0         | 1         | 0          |45000
2  | 00002   | 0         | 0         | 1         | 0         | 0          |53000

到目前为止，我只在“数据帧B”中实现了这一点：

注意：在操作的数据框架中，我没有来自数据框架A的所有产品

id | codoper | CodProd  | valor
-------------------------------
1  | 00001   | 55       | 45000
2  | 00001   | 18       | 45000
3  | 00002   | 33       | 53000
1  | 00001   | 55       | 45000

谢谢

这是

merge

和

pivot\u table

的组合，有一些调整：

(Products.merge(Operations, 
                left_on='Cod', 
                right_on='CodProd',
                how='left')
     .pivot_table(index=['codoper','valor'],
                  values='Product',
                  columns='Cod', 
                  fill_value=0,
                  aggfunc='any')
     .reindex(Products.Cod.unique(), 
              fill_value=False,
              axis=1)
     .astype(int)
     .add_prefix('Product_')
     .reset_index()
)

输出：

Cod codoper    valor  Product_18  Product_22  Product_33  Product_55  \
0     00001  45000.0           1           0           0           1   
1     00002  53000.0           0           0           1           0   

Cod  Product_67  
0             0  
1             0

您需要将

产品

中的假人与

操作

中的假人组合在一起。首先使用前缀定义输出列：

columns=['id'，'codoper']+[f“Product{cod}”表示['cod']中的cod。unique（）]+['valor']

然后，像上面一样使用get dummies，但是使用与定义列相同的前缀。按完全共线的所有列分组，即

id

、

codoper

和

valor

。如果它们不是完全共线的，那么您需要决定如何将它们聚合到

codoper

级别。最后，使用前面定义的输出列重新编制索引，用零填充缺少的值

pd.get_dummies（B，columns=['CodProd']，prefix='Product'）.groupby（['id'，'codoper'，'valor']，as_index=False）.sum（）.reindex（columns=columns，fill_value=0）

有关如何提出好问题的指导，请参阅和。请在你的问题中包括你迄今为止所做的尝试。因此，没有“代码工厂”来编写代码。看见

(Products.merge(Operations, 
                left_on='Cod', 
                right_on='CodProd',
                how='left')
     .pivot_table(index=['codoper','valor'],
                  values='Product',
                  columns='Cod', 
                  fill_value=0,
                  aggfunc='any')
     .reindex(Products.Cod.unique(), 
              fill_value=False,
              axis=1)
     .astype(int)
     .add_prefix('Product_')
     .reset_index()
)

Cod codoper    valor  Product_18  Product_22  Product_33  Product_55  \
0     00001  45000.0           1           0           0           1   
1     00002  53000.0           0           0           1           0   

Cod  Product_67  
0             0  
1             0

  id codoper  Product_18  Product_22  Product_33  Product_55  Product_67  valor
0  1   00001           0           0           0           2           0  45000
1  2   00001           1           0           0           0           0  45000
2  3   00002           0           0           1           0           0  53000