Python 使用pandas对列中的值进行计数
我有数据Python 使用pandas对列中的值进行计数,python,pandas,dataframe,Python,Pandas,Dataframe,我有数据 member_id device_id 19404 dfbc9d3230304cdfb0316cc32c41b67f [2016-04-28, 2016-04-27, 2016-04-26, 2016-04-22] 19555 176e307bd8714a00ac2b99276123f0a7 [2016-04-29, 2016-04-28, 2016-04-27, 2016-04-23] 19632
member_id device_id
19404 dfbc9d3230304cdfb0316cc32c41b67f [2016-04-28, 2016-04-27, 2016-04-26, 2016-04-22]
19555 176e307bd8714a00ac2b99276123f0a7 [2016-04-29, 2016-04-28, 2016-04-27, 2016-04-23]
19632 a6d4b631e09a4b31afef4c93472c7da3 [2016-04-29, 2016-04-28, 2016-04-27]
19792 0146b09048ce4c47af4bbc69e7999137 [2016-04-23, 2016-04-22, 2016-04-21, 2016-04-20]
20258 1510f9b4efc14183ad412eb54c9e058f [2016-04-09]
5f42f4d02d38456689e58d6a1b9a3e16 [2016-04-29, 2016-04-28, 2016-04-25, 2016-04-22]
我需要计算列表中第三列的值。
我尝试了len()
,我以为它返回列表的长度,但它错了。
new=data.groupby(['member\u id','device\u id'])['event\u date'].unique()
count()
返回所有值的总和假设您在最后一列l
中有一个列表
:
In [113]: df.l.map(len)
Out[113]:
0 4
1 4
2 3
3 4
4 1
5 4
Name: l, dtype: int64
如果最后一列是字符串,则可以先将其转换为列表:
df.l.str.replace('[\[\]]', '').str.split('\s*,\s*').map(len)
假设您在最后一列中有一个值的
列表:
In [113]: df.l.map(len)
Out[113]:
0 4
1 4
2 3
3 4
4 1
5 4
Name: l, dtype: int64
如果最后一列是字符串,则可以先将其转换为列表:
df.l.str.replace('[\[\]]', '').str.split('\s*,\s*').map(len)
您可以对分组列应用len
功能。.iat[0]
获取组中的第一项,在本例中是您的列表
>>> df.groupby(['member_id', 'device_id'])['event_date'].agg(
{'event_count': lambda group: len(group.iat[0])})
event_count
member_id device_id
19404 dfbc9d3230304cdfb0316cc32c41b67f 4
19555 176e307bd8714a00ac2b99276123f0a7 4
19632 a6d4b631e09a4b31afef4c93472c7da3 3
19792 0146b09048ce4c47af4bbc69e7999137 4
20258 1510f9b4efc14183ad412eb54c9e058f 1
5f42f4d02d38456689e58d6a1b9a3e16 4
您可以对分组列应用len
功能。.iat[0]
获取组中的第一项,在本例中是您的列表
>>> df.groupby(['member_id', 'device_id'])['event_date'].agg(
{'event_count': lambda group: len(group.iat[0])})
event_count
member_id device_id
19404 dfbc9d3230304cdfb0316cc32c41b67f 4
19555 176e307bd8714a00ac2b99276123f0a7 4
19632 a6d4b631e09a4b31afef4c93472c7da3 3
19792 0146b09048ce4c47af4bbc69e7999137 4
20258 1510f9b4efc14183ad412eb54c9e058f 1
5f42f4d02d38456689e58d6a1b9a3e16 4
这就是你想要的吗:
import pandas as pd
df = pd.DataFrame(columns=('member_id','device_id','event_date'),data=[
[19404,'dfbc9d3230304cdfb0316cc32c41b67f',['2016-04-28', '2016-04-27', '2016-04-26', '2016-04-22']],
[19555,'176e307bd8714a00ac2b99276123f0a7',['2016-04-29', '2016-04-28', '2016-04-27', '2016-04-23']],
[19632,'a6d4b631e09a4b31afef4c93472c7da3',['2016-04-29', '2016-04-28', '2016-04-27']],
[19792,'0146b09048ce4c47af4bbc69e7999137',['2016-04-23', '2016-04-22', '2016-04-21', '2016-04-20']],
[20258,'1510f9b4efc14183ad412eb54c9e058f',['2016-04-09']],
[20258,'5f42f4d02d38456689e58d6a1b9a3e16',['2016-04-29', '2016-04-28', '2016-04-25', '2016-04-22']]
])
new = df.groupby(['member_id', 'device_id'])['event_date']
for each_n in new:
print each_n[0],len(each_n[1].values[0])
输出
(19404, 'dfbc9d3230304cdfb0316cc32c41b67f') 4
(19555, '176e307bd8714a00ac2b99276123f0a7') 4
(19632, 'a6d4b631e09a4b31afef4c93472c7da3') 3
(19792, '0146b09048ce4c47af4bbc69e7999137') 4
(20258, '1510f9b4efc14183ad412eb54c9e058f') 1
(20258, '5f42f4d02d38456689e58d6a1b9a3e16') 4
这就是你想要的吗:
import pandas as pd
df = pd.DataFrame(columns=('member_id','device_id','event_date'),data=[
[19404,'dfbc9d3230304cdfb0316cc32c41b67f',['2016-04-28', '2016-04-27', '2016-04-26', '2016-04-22']],
[19555,'176e307bd8714a00ac2b99276123f0a7',['2016-04-29', '2016-04-28', '2016-04-27', '2016-04-23']],
[19632,'a6d4b631e09a4b31afef4c93472c7da3',['2016-04-29', '2016-04-28', '2016-04-27']],
[19792,'0146b09048ce4c47af4bbc69e7999137',['2016-04-23', '2016-04-22', '2016-04-21', '2016-04-20']],
[20258,'1510f9b4efc14183ad412eb54c9e058f',['2016-04-09']],
[20258,'5f42f4d02d38456689e58d6a1b9a3e16',['2016-04-29', '2016-04-28', '2016-04-25', '2016-04-22']]
])
new = df.groupby(['member_id', 'device_id'])['event_date']
for each_n in new:
print each_n[0],len(each_n[1].values[0])
输出
(19404, 'dfbc9d3230304cdfb0316cc32c41b67f') 4
(19555, '176e307bd8714a00ac2b99276123f0a7') 4
(19632, 'a6d4b631e09a4b31afef4c93472c7da3') 3
(19792, '0146b09048ce4c47af4bbc69e7999137') 4
(20258, '1510f9b4efc14183ad412eb54c9e058f') 1
(20258, '5f42f4d02d38456689e58d6a1b9a3e16') 4
奇怪的是,它将不正确的结果返回给我的数据奇怪的是,它将不正确的结果返回给我的数据