Python 基于另一列聚合一列_Python_Pandas

Python 基于另一列聚合一列

python pandas

Python 基于另一列聚合一列,python,pandas,Python,Pandas,从技术上讲，这应该是一件简单的事情，但不幸的是，目前我还没有意识到这一点我试图根据另一列找到另一列的比例。例如： Column 1 | target_variable 'potato' 1 'potato' 0 'tomato' 1 'brocolli' 1 'tomato' 0 预期产出将是： column 1 | target = 1 | target = 0 | total_count 'potat

从技术上讲，这应该是一件简单的事情，但不幸的是，目前我还没有意识到这一点

我试图根据另一列找到另一列的比例。例如：

Column 1   |  target_variable
'potato'         1
'potato'         0
'tomato'         1
'brocolli'       1
'tomato'         0

预期产出将是：

column 1   | target = 1  | target = 0 | total_count
'potato'   |     1       |      1     |     2
'tomato'   |     1       |      1     |     2
'brocolli' |     1       |      0     |     1

但是，我认为我错误地使用了聚合，因此我采用了以下幼稚的实现：

z = {}
for i in train.index:
    fruit = train["fruit"][i]
    l = train["target"][i]
    if fruit not in z:
        if l == 1:
            z[fruit] = {1:1,0:0,'count':1}
        else:
            z[fruit] = {1:0,0:1,'count':1}
    else:
        if l == 1:
            z[fruit][1] += 1
        else:
            z[fruit][0] += 1
        z[fruit]['count'] += 1

它以字典形式提供类似的输出

有人能告诉我熊猫之路的正确语法吗？：）

谢谢！：）

您需要+++：

或：

让我们使用

get_dummies

，

add_prefix

和

groupby

：

df = df.assign(**df['target_variable'].astype(str).str.get_dummies().add_prefix('target = '))
df['total_count'] = df.drop('target_variable', axis=1).sum(axis=1)
df.groupby('Column 1').sum()

输出：

            target_variable  target = 0  target = 1  total_count
Column 1                                                        
'brocolli'                1           0           1            1
'potato'                  1           1           1            2
'tomato'                  1           1           1            2

输出正确吗？@jezrael哎呀，对不起，修好了！：）谢谢你指出这一点！如果添加了另一行“土豆”，则输出会发生更改，1？对不起，这是什么意思@耶斯雷尔土豆应该是1:2和0:1

df = df.assign(**df['target_variable'].astype(str).str.get_dummies().add_prefix('target = '))
df['total_count'] = df.drop('target_variable', axis=1).sum(axis=1)
df.groupby('Column 1').sum()

            target_variable  target = 0  target = 1  total_count
Column 1                                                        
'brocolli'                1           0           1            1
'potato'                  1           1           1            2
'tomato'                  1           1           1            2