Python 熊猫:通过数据帧聚合数据

Python 熊猫:通过数据帧聚合数据,python,pandas,group-by,sum,aggregate,Python,Pandas,Group By,Sum,Aggregate,我有数据帧: ID,"url","app_name","used_at","active_seconds","device_connection","device_os","device_type","device_usage" 1ca9bb884462c3ba2391bf669c22d4bd,"",VK Client,2016-01-01 00:00:13,5,3g,ios,smartphone,home b8f4df3f99ad786a77897c583d98f615,"",VKontakt

我有数据帧:

ID,"url","app_name","used_at","active_seconds","device_connection","device_os","device_type","device_usage"
1ca9bb884462c3ba2391bf669c22d4bd,"",VK Client,2016-01-01 00:00:13,5,3g,ios,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:01:45,107,wifi,android,smartphone,home
1ca9bb884462c3ba2391bf669c22d4bd,"",Twitter,2016-01-01 00:02:48,20,3g,ios,smartphone,home
1ca9bb884462c3ba2391bf669c22d4bd,"",VK Client,2016-01-01 00:03:08,796,3g,ios,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",WhatsApp Messenger,2016-01-01 00:03:32,70,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:04:42,27,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:05:30,5,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",WhatsApp Messenger,2016-01-01 00:05:36,47,wifi,android,smartphone,home
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:06:23,20,wifi,android,smartphone,home
a703114aa8a03495c3e042647212fa63,"",Instagram,2016-01-01 00:06:41,118,3g,android,smartphone,home
1637ce5a4c4868e694004528c642d0ac,"",Camera,2016-01-01 00:06:43,16,wifi,android,smartphone,home
1637ce5a4c4868e694004528c642d0ac,"",VKontakte,2016-01-01 00:07:00,45,wifi,android,smartphone,home
a703114aa8a03495c3e042647212fa63,"",VKontakte,2016-01-01 00:08:40,99,3g,android,smartphone,home
1637ce5a4c4868e694004528c642d0ac,"",VKontakte,2016-01-01 00:10:05,1,wifi,android,smartphone,home
我需要计算每个
app\u name
与每个
ID
的份额。 但我不能做下一步: 每个应用程序到每个id的总和,我应该除以所有应用程序到id的总和,然后乘以100。(查找百分比) 我有:

但当我尝试时,它只会返回每个应用程序的数量

short = df.groupby(['ID', 'app_name']).agg({'app_name': len, 'active_seconds': sum / df.ID.app_name.sum() * 100}).rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'}).reset_index()
它返回一个错误

我如何解决这个问题?

IIUC您需要:

short = df.groupby(['ID', 'app_name'])
          .agg({'app_name': len, 
                'active_seconds': lambda x: 100 * x.sum() / df.active_seconds.sum()})
          .rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'})
          .reset_index()

print (short)

                                 ID            app_name  count_sec  sum_app
0  1637ce5a4c4868e694004528c642d0ac              Camera   1.162791        1
1  1637ce5a4c4868e694004528c642d0ac           VKontakte   3.343023        2
2  1ca9bb884462c3ba2391bf669c22d4bd             Twitter   1.453488        1
3  1ca9bb884462c3ba2391bf669c22d4bd           VK Client  58.212209        2
4  a703114aa8a03495c3e042647212fa63           Instagram   8.575581        1
5  a703114aa8a03495c3e042647212fa63           VKontakte   7.194767        1
6  b8f4df3f99ad786a77897c583d98f615           VKontakte  11.555233        4
7  b8f4df3f99ad786a77897c583d98f615  WhatsApp Messenger   8.502907        2
另一个解决方案:

#you need another name of df, e.g. short1
short1 = df.groupby(['ID', 'app_name'])
           .agg({'app_name': len, 'active_seconds': sum})
           .rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'})
           .reset_index()
short1.count_sec = 100 * short1.count_sec / df.active_seconds.sum()
print (short1)
                                 ID            app_name  count_sec  sum_app
0  1637ce5a4c4868e694004528c642d0ac              Camera   1.162791        1
1  1637ce5a4c4868e694004528c642d0ac           VKontakte   3.343023        2
2  1ca9bb884462c3ba2391bf669c22d4bd             Twitter   1.453488        1
3  1ca9bb884462c3ba2391bf669c22d4bd           VK Client  58.212209        2
4  a703114aa8a03495c3e042647212fa63           Instagram   8.575581        1
5  a703114aa8a03495c3e042647212fa63           VKontakte   7.194767        1
6  b8f4df3f99ad786a77897c583d98f615           VKontakte  11.555233        4
7  b8f4df3f99ad786a77897c583d98f615  WhatsApp Messenger   8.502907        2

您能显示预期的输出吗?我的df更大,它在
count\u sec
列中返回我所有的
0
。我试着乘以10000,但这并不能改变情况,我想它会返回我
int
。如何将其转换为flioat?使用
.astype(float)
我应该在哪里使用它
100*x.sum()/df.active\u seconds.sum().astype(float)
Yes,或者尝试
100*x.sum().astype(float)/df.active\u seconds.sum()
#you need another name of df, e.g. short1
short1 = df.groupby(['ID', 'app_name'])
           .agg({'app_name': len, 'active_seconds': sum})
           .rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'})
           .reset_index()
short1.count_sec = 100 * short1.count_sec / df.active_seconds.sum()
print (short1)
                                 ID            app_name  count_sec  sum_app
0  1637ce5a4c4868e694004528c642d0ac              Camera   1.162791        1
1  1637ce5a4c4868e694004528c642d0ac           VKontakte   3.343023        2
2  1ca9bb884462c3ba2391bf669c22d4bd             Twitter   1.453488        1
3  1ca9bb884462c3ba2391bf669c22d4bd           VK Client  58.212209        2
4  a703114aa8a03495c3e042647212fa63           Instagram   8.575581        1
5  a703114aa8a03495c3e042647212fa63           VKontakte   7.194767        1
6  b8f4df3f99ad786a77897c583d98f615           VKontakte  11.555233        4
7  b8f4df3f99ad786a77897c583d98f615  WhatsApp Messenger   8.502907        2