Python 用于选择多个列的Groupby
我有一个数据框df表,有3列,比如:Python 用于选择多个列的Groupby,python,pandas,group-by,Python,Pandas,Group By,我有一个数据框df表,有3列,比如: [IN]:df [OUT]: Tree Name Planted by Govt Planted by College A Yes No B Yes No C Yes No C Yes No A
[IN]:df
[OUT]:
Tree Name Planted by Govt Planted by College
A Yes No
B Yes No
C Yes No
C Yes No
A No No
B No Yes
B Yes Yes
B Yes No
B Yes No
查询:
1 Tree(s) 'A' were planted by govt and not by college
3 Tree(s) 'B' were planted by govt and not by college
2 Tree(s) 'C' were planted by govt and not by college
每种树有多少棵是由政府种植的,而不是由大学种植的。政府:是,私人:否
需要的输出:
1 Tree(s) 'A' were planted by govt and not by college
3 Tree(s) 'B' were planted by govt and not by college
2 Tree(s) 'C' were planted by govt and not by college
任何人都可以帮助首先创建布尔掩码,比较按位
和
链接的列,然后使用聚合和
转换为数字:
s = df['Planted by Govt'].eq('Yes') & df['Planted by College'].eq('No')
out = s.view('i1').groupby(df['Tree Name']).sum()
#alternative
#out = s.astype(int).groupby(df['Tree Name']).sum()
print (out)
Tree Name
A 1
B 3
C 2
dtype: int8
最后一个自定义输出使用f-string
s:
for k, v in out.items():
print (f"{v} Tree(s) {k} were planted by govt and not by college")
1 Tree(s) A were planted by govt and not by college
3 Tree(s) B were planted by govt and not by college
2 Tree(s) C were planted by govt and not by college
另一个想法是创建新的原始列:
df['new'] = (df['Planted by Govt'].eq('Yes') & df['Planted by College'].eq('No')).view('i1')
print (df)
Tree Name Planted by Govt Planted by College new
0 A Yes No 1
1 B Yes No 1
2 C Yes No 1
3 C Yes No 1
4 A No No 0
5 B No Yes 0
6 B Yes Yes 0
7 B Yes No 1
8 B Yes No 1
out = df.groupby('Tree Name')['new'].sum()
print (out)
Tree Name
A 1
B 3
C 2
Name: new, dtype: int8
或者我们可以用计数
df[df['Planted by Govt'].eq('Yes')& df['Planted by College'].eq('No')].groupby('Tree Name').count()['Planted by Govt'].rename('PLanted only by Govt')
print(result)
Tree Name
A 1
B 3
C 2
Name: PLanted only by Govt, dtype: int64