Pandas 熊猫：列为列表时按值分组_Pandas

Pandas 熊猫：列为列表时按值分组

pandas

Pandas 熊猫：列为列表时按值分组,pandas,Pandas,我有这样一个数据帧： df = pd.DataFrame({'type':[[1,3],[1,2,3],[2,3]], 'value':[4,5,6]}) type | value ------------- 1,3 | 4 1,2,3| 5 2,3 | 6 我想根据“类型”列中的不同值进行分组，例如，值的总和为： type | sum ------------ 1 | 9 2 | 11 3 | 15 谢谢你的帮助首先需要通过列类型通过Dataframe构造函数重塑

我有这样一个数据帧：

df = pd.DataFrame({'type':[[1,3],[1,2,3],[2,3]], 'value':[4,5,6]})

type | value
-------------
1,3  | 4
1,2,3| 5
2,3  | 6

我想根据“类型”列中的不同值进行分组，例如，值的总和为：

type | sum
------------
1    | 9
2    | 11
3    | 15

谢谢你的帮助

首先需要通过列

类型通过Dataframe
构造函数重塑Dataframe
，然后。然后将列type
强制转换为int
，最后是聚合sum
：
df1 = pd.DataFrame(df['type'].values.tolist(), index = df['value']) \
        .stack() \
        .reset_index(name='type')
df1.type = df1.type.astype(int)
print (df1)
   value  level_1  type
0      4        0     1
1      4        1     3
2      5        0     1
3      5        1     2
4      5        2     3
5      6        0     2
6      6        1     3


print (df1.groupby('type', as_index=False)['value'].sum())
   type  value
0     1      9
1     2     11
2     3     15

另一个解决方案包括：
带有Series
的版本，其中选择索引的第一级，转换为Series
by并聚合sum
。最后将列索引
重命名为类型
：
df1 = pd.DataFrame(df['type'].values.tolist(), index = df['value']).stack().astype(int)
print (df1)
value   
4      0    1
       1    3
5      0    1
       1    2
       2    3
6      0    2
       1    3
dtype: int32

print (df1.index.get_level_values(0)
          .to_series()
          .groupby(df1.values)
          .sum()
          .reset_index()
          .rename(columns={'index':'type'}))
   type  value
0     1      9
1     2     11
2     3     15


按注释编辑-这是一个稍加修改的第二个解决方案，具有：
非常感谢你。如果我想按类型（value1、value2、value3等）聚合几个类似value的列，该怎么办。似乎我需要为我要聚合的每个列创建一个df，但必须有一个优雅的解决方案。再次感谢您，对于大型数据集，可能最好避免连接，并将列逐个汇总，不确定。。。
df1 = pd.DataFrame(df['type'].values.tolist(), index = df['value']).stack().astype(int)
print (df1)
value   
4      0    1
       1    3
5      0    1
       1    2
       2    3
6      0    2
       1    3
dtype: int32

print (df1.index.get_level_values(0)
          .to_series()
          .groupby(df1.values)
          .sum()
          .reset_index()
          .rename(columns={'index':'type'}))
   type  value
0     1      9
1     2     11
2     3     15

df = pd.DataFrame({'type':[[1,3],[1,2,3],[2,3]], 
                   'value1':[4,5,6], 
                   'value2':[1,2,3], 
                   'value3':[4,6,1]})
print (df)
        type  value1  value2  value3
0     [1, 3]       4       1       4
1  [1, 2, 3]       5       2       6
2     [2, 3]       6       3       1

df1 = pd.DataFrame(df.pop('type').values.tolist()) \
        .stack() \
        .reset_index(level=1, drop=True) \
        .rename('type') \
        .astype(int)
print (df1)
0    1
0    3
1    1
1    2
1    3
2    2
2    3
Name: type, dtype: int32

print (df.join(df1).groupby('type', as_index=False).sum())
   type  value1  value2  value3
0     1       9       3      10
1     2      11       5       7
2     3      15       6      11