Python数据框架中的分组数据
我有一个数据帧,如下所示:Python数据框架中的分组数据,python,Python,我有一个数据帧,如下所示: | Year | Cause of Death |Gender| Total Case | | 2016 | Killed | M | 3 | | 2016 | Suicide | M | 5 | | 2016 | Killed | F | 7 |
| Year | Cause of Death |Gender| Total Case |
| 2016 | Killed | M | 3 |
| 2016 | Suicide | M | 5 |
| 2016 | Killed | F | 7 |
| 2017 | Killed | F | 12 |
| 2017 | Killed | M | 2 |
| 2017 | Suicide | F | 5 |
| 2017 | Suicide | M | 6 |
从这个数据帧,我想创建一个新的datafarame,如下所示:
|Year|Cause of Death|Total Case|
|2016| Killed | 10 |
| | Suicide | 5 |
|2017| Killed | 14 |
| | Suicide | 11 |
有什么简单的方法吗
谢谢
从这里开始,这是一个格式问题:
df.groupby(['Year', 'Cause of Death']).sum()
Total Case
Year Cause of Death
2016 Killed 10
Suicide 5
2017 Killed 14
Suicide 11
或
Pandas DataFrame附带了一个函数来实现这一点。看起来你不在乎性别栏,只想按年份和死因分组
g = df[['Year', 'Cause of Death', 'Total Cases']].groupby(['Year', 'Cause of Death'])
g.sum()
# Total Cases
# Year Cause of Death
# 2016 Killed 10
# Suicide 5
# 2017 Killed 14
# Suicide 11
第一行仅选择您感兴趣的列,然后对要分组的列调用groupby
。这将返回一个新对象,该对象具有一个名为sum
的函数,该函数将对每个组中的值求和。您可以尝试使用和:
df是:
Year Cause of Death Gender Total Case
0 2016 Killed M 3
1 2016 Suicide M 5
2 2016 Killed F 7
3 2017 Killed F 12
4 2017 Killed M 2
5 2017 Suicide F 5
6 2017 Suicide M 6
然后应用以下方法:
new_df = df['Total Case'].groupby([df['Year'], df['Cause of Death']]).sum()
new_df = new_df.reset_index()
new_df
新的_df
将是:
Year Cause of Death Total Case
0 2016 Killed 10
1 2016 Suicide 5
2 2017 Killed 14
3 2017 Suicide 11
使用Pandas中的方法“groupby”
grouped = df.groupby(['Year', 'Cause of Death'])
然后,要获得总案例的总和,请使用以下公式:
grouped.sum()
这将为您提供所需的输出
|Year|Cause of Death|Total Case|
|2016| Killed | 10 |
| | Suicide | 5 |
|2017| Killed | 14 |
| | Suicide | 11 |
grouped = df.groupby(['Year', 'Cause of Death'])
grouped.sum()
|Year|Cause of Death|Total Case|
|2016| Killed | 10 |
| | Suicide | 5 |
|2017| Killed | 14 |
| | Suicide | 11 |