Python：替换groupby操作_Python_Pandas

Python：替换groupby操作

python pandas

Python：替换groupby操作,python,pandas,Python,Pandas,我将下表作为数据帧： | ID | Name | Sales | Source | |----|------|-------|----------| | 1 | a | 34 | Source A | | 2 | b | 3423 | Source A | | 3 | c | 2 | Source A | | 4 | d | 342 | Source A | | 3 | c | 34 | Source A | | 5 | e

我将下表作为数据帧：

| ID | Name | Sales | Source   |
|----|------|-------|----------|
| 1  | a    | 34    | Source A |
| 2  | b    | 3423  | Source A |
| 3  | c    | 2     | Source A |
| 4  | d    | 342   | Source A |
| 3  | c    | 34    | Source A |
| 5  | e    | 234   | Source A |
| 6  | f    | 234   | Source A |
| 7  | g    | 23    | Source A |
| 1  | a    | 12    | Source B |
| 2  | b    | 42    | Source B |
| 3  | c    | 9     | Source B |
| 2  | b    | 22    | Source B |
| 1  | a    | 1     | Source B |
| 8  | h    | 56    | Source B |

（i）为每个资源的每个ID聚合销售额和（ii）将结果放在两个新列“源A”和“源B”中的最佳方法是什么，以便生成的数据帧如下所示：

| ID | Name | Source A | Source B |
|----|------|----------|----------|
| 1  | a    | 34       | 13       |
| 2  | b    | 3423     | 64       |
| 3  | c    | 36       | 9        |
| 4  | d    | 342      | 0        |
| 5  | e    | 234      | 0        |
| 6  | f    | 234      | 0        |
| 7  | g    | 23       | 0        |
| 8  | h    | 0        | 56       |

data = {"ID":[1,2,3,4,3,5,6,7,1,2,3,2,1,8], 
      "Name":list("abcdcefgabcbah"), 
      "Sales":[34,3423,2,342,34,234,234,23,12,42,9,22,1,56],
      "Source":["Source A"]*8 + ["Source B"]*6
     }
df = pd.DataFrame(data)

df.groupby(["ID","Name","Source"])["Sales"].sum().unstack()

我最初的做法如下：

| ID | Name | Source A | Source B |
|----|------|----------|----------|
| 1  | a    | 34       | 13       |
| 2  | b    | 3423     | 64       |
| 3  | c    | 36       | 9        |
| 4  | d    | 342      | 0        |
| 5  | e    | 234      | 0        |
| 6  | f    | 234      | 0        |
| 7  | g    | 23       | 0        |
| 8  | h    | 0        | 56       |

data = {"ID":[1,2,3,4,3,5,6,7,1,2,3,2,1,8], 
      "Name":list("abcdcefgabcbah"), 
      "Sales":[34,3423,2,342,34,234,234,23,12,42,9,22,1,56],
      "Source":["Source A"]*8 + ["Source B"]*6
     }
df = pd.DataFrame(data)

df.groupby(["ID","Name","Source"])["Sales"].sum().unstack()

问题：我的初始表是使用不同的文件构建的，而不是应用

pd.concat

。因此，我觉得我可以通过首先以不同的方式连接（或合并）来实现最终的表。有没有更好的方法来实现这一点？作为侧节点：实际数据表由6个不同的源组成

谢谢你的帮助

请尝试以下代码：

df.groupby(['Name', 'Source'])['Sales'].sum()\
    .unstack(1).fillna(0).reset_index()

请尝试以下代码：

df.groupby(['Name', 'Source'])['Sales'].sum()\
    .unstack(1).fillna(0).reset_index()

您可以使用

pd.crosstab

：输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

或者，透视表输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

或者使用

set_index

和

sum

和

level

参数，然后

unstack

：输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

您可以使用

pd.crosstab

：输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

或者，透视表输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

或者使用

set_index

和

sum

和

level

参数，然后

unstack

：输出：

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a           34.0      13.0
b         3423.0      64.0
c           36.0       9.0
d          342.0       0.0
e          234.0       0.0
f          234.0       0.0
g           23.0       0.0
h            0.0      56.0

Source  Source A  Source B
Name                      
a             34        13
b           3423        64
c             36         9
d            342         0
e            234         0
f            234         0
g             23         0
h              0        56

谢谢同样有效，但其他响应有更多选项。谢谢。同样有效，但另一种反应有更多的选择。