Python 为数据帧组合交叉表数据透视groupby

Python 为数据帧组合交叉表数据透视groupby,python,pandas,contingency,Python,Pandas,Contingency,我认为这是一个非常简单的问题,但我找不到另一个解决类似问题的条目 我有一个熊猫数据框,看起来像这样: group1 group2 meandiff lower upper reject 0 bacc dry_sed 2575.1697 2033.6713 3116.6681 True 1 bacc junc_hal -81.8513 -555.8132 392.11

我认为这是一个非常简单的问题,但我找不到另一个解决类似问题的条目

我有一个熊猫数据框,看起来像这样:

         group1       group2   meandiff      lower      upper  reject
0          bacc      dry_sed  2575.1697  2033.6713  3116.6681    True
1          bacc     junc_hal   -81.8513  -555.8132   392.1106   False
2          bacc  other_trees    -1.2333  -512.6246   510.1579   False
3          bacc        phrag   613.2256     0.4309  1226.0204    True
4          bacc        water -1074.4667 -1687.2614  -461.6719    True
5          bacc      wet_sed  -437.1854  -943.2217    68.8508   False
6       dry_sed     junc_hal -2657.0210 -3068.3186 -2245.7234    True
7       dry_sed  other_trees -2576.4030 -3030.3269 -2122.4792    True
8       dry_sed        phrag -1961.9441 -2527.6677 -1396.2204    True
9       dry_sed        water -3649.6364 -4215.3600 -3083.9127    True
10      dry_sed      wet_sed -3012.3551 -3460.2374 -2564.4728    True
11     junc_hal  other_trees    80.6179  -290.1464   451.3823   False
12     junc_hal        phrag   695.0769   193.6165  1196.5373    True
13     junc_hal        water  -992.6154 -1494.0758  -491.1550    True
14     junc_hal      wet_sed  -355.3341  -718.6767     8.0084   False
15  other_trees        phrag   614.4590    77.4825  1151.4354    True
16  other_trees        water -1073.2333 -1610.2098  -536.2569    True
17  other_trees      wet_sed  -435.9521  -846.9253   -24.9788    True
18        phrag        water -1687.6923 -2321.9951 -1053.3895    True
19        phrag      wet_sed -1050.4111 -1582.2901  -518.5320    True
20        water      wet_sed   637.2812   105.4022  1169.1603    True
          bacc  dry_sed junc_hal    other_trees phrag   water   wet_sed
 bacc        NA    1       0              0        1      1       0
 dry_sed      1    NA      1              1        1      1       1
 junc_hal     0    1      NA              0        1      1       0
 other_trees  0    1       0             NA        1      1       1
 phrag        1    1       1              1       NA      1       1
 water        1    1       1              1        1      NA      1
 wet_sed      0    1       0              1        1       1      NA
我想在group1和group2之间创建一个列联表,但在每个单元格中输入列Reject中的值

应该是这样的:

         group1       group2   meandiff      lower      upper  reject
0          bacc      dry_sed  2575.1697  2033.6713  3116.6681    True
1          bacc     junc_hal   -81.8513  -555.8132   392.1106   False
2          bacc  other_trees    -1.2333  -512.6246   510.1579   False
3          bacc        phrag   613.2256     0.4309  1226.0204    True
4          bacc        water -1074.4667 -1687.2614  -461.6719    True
5          bacc      wet_sed  -437.1854  -943.2217    68.8508   False
6       dry_sed     junc_hal -2657.0210 -3068.3186 -2245.7234    True
7       dry_sed  other_trees -2576.4030 -3030.3269 -2122.4792    True
8       dry_sed        phrag -1961.9441 -2527.6677 -1396.2204    True
9       dry_sed        water -3649.6364 -4215.3600 -3083.9127    True
10      dry_sed      wet_sed -3012.3551 -3460.2374 -2564.4728    True
11     junc_hal  other_trees    80.6179  -290.1464   451.3823   False
12     junc_hal        phrag   695.0769   193.6165  1196.5373    True
13     junc_hal        water  -992.6154 -1494.0758  -491.1550    True
14     junc_hal      wet_sed  -355.3341  -718.6767     8.0084   False
15  other_trees        phrag   614.4590    77.4825  1151.4354    True
16  other_trees        water -1073.2333 -1610.2098  -536.2569    True
17  other_trees      wet_sed  -435.9521  -846.9253   -24.9788    True
18        phrag        water -1687.6923 -2321.9951 -1053.3895    True
19        phrag      wet_sed -1050.4111 -1582.2901  -518.5320    True
20        water      wet_sed   637.2812   105.4022  1169.1603    True
          bacc  dry_sed junc_hal    other_trees phrag   water   wet_sed
 bacc        NA    1       0              0        1      1       0
 dry_sed      1    NA      1              1        1      1       1
 junc_hal     0    1      NA              0        1      1       0
 other_trees  0    1       0             NA        1      1       1
 phrag        1    1       1              1       NA      1       1
 water        1    1       1              1        1      NA      1
 wet_sed      0    1       0              1        1       1      NA
NA只是作为参考,可能有任何数字

有没有一种直接的方法可以用这种方式总结数据?在开始使用循环分析表之前,我想确定没有简单直接的方法来实现这一点


提前谢谢

您可以透视数据帧

df.pivot(index='group1', columns='group2', values='reject')

group2      dry_sed junc_hal other_trees phrag water wet_sed
group1                                                      
bacc           True    False       False  True  True   False
dry_sed        None     True        True  True  True    True
junc_hal       None     None       False  True  True   False
other_trees    None     None        None  True  True    True
phrag          None     None        None  None  True    True
water          None     None        None  None  None    True

假设您的数据帧被称为
df
,您可以执行以下操作:

df['reject_flag'] = df['reject'].astype(int)

output = df.pivot_table(index='group1', columns='group2', values='reject_flag')
这将为您提供以下信息:

group2       dry_sed  junc_hal  other_trees  phrag  water  wet_sed
group1                                                            
bacc             1.0       0.0          0.0    1.0    1.0      0.0
dry_sed          NaN       1.0          1.0    1.0    1.0      1.0
junc_hal         NaN       NaN          0.0    1.0    1.0      0.0
other_trees      NaN       NaN          NaN    1.0    1.0      1.0
phrag            NaN       NaN          NaN    NaN    1.0      1.0
water            NaN       NaN          NaN    NaN    NaN      1.0