Python groupby结果计数频率_Python_Pandas_Group By

Python groupby结果计数频率

python pandas

Python groupby结果计数频率,python,pandas,group-by,Python,Pandas,Group By,我有一个数据帧 df = pd.DataFrame({'id':['one','one','two','two','three','three','three'], 'type':['current','saving','current','current','current','saving','credit']}) 我想计算只有“当前”的id的数量应该是这样的： only_currnt_id_list = ['two'] 我认为你需要： L = d

我有一个数据帧

df = pd.DataFrame({'id':['one','one','two','two','three','three','three'],
                   'type':['current','saving','current','current','current','saving','credit']})

我想计算只有“当前”的id的数量应该是这样的：

only_currnt_id_list = ['two']

我认为你需要：

L = df.groupby('id') \
      .filter(lambda x: (x['type'] == 'current').all() and 
                        (x['type'] == 'current').sum() == 1)['id'].tolist()
print (L)

['two']

编辑：

不使用纯熊猫，但您可以只使用所有ID和具有

类型的ID之间的set
差异当前“

：

>>> set(df["id"]) - set(df["id"][df["type"] != "current"])
{2}

使用

pd.crosstab

df=pd.crosstab(df.id,df.type)
df.loc[df.sum(1)==df.current,].index.values[0]

Out[1065]: 'two'

或者您可以使用

groupby

和

nunique

df['unique']=df.groupby('id')['type'].transform('nunique')

df.loc[(df.unique==1)&(df.type=='current'),:].id.unique().tolist()


Out[1085]: ['two']

为什么会产生

two

？因为只有用户“two”只有“current”类型

（x['type']='current'）。sum（）==1

的部分是什么？嗨，jezrael，谢谢你的回答，它确实有效。如果用户'two'有多个'current'类型怎么办？抱歉，这意味着如果只需要筛选具有一个值的

current

id。为了更好地解释，我添加了示例。如果需要

id

的值，我会添加解决方案，如果所有值都是当前值

df=pd.crosstab(df.id,df.type)
df.loc[df.sum(1)==df.current,].index.values[0]

Out[1065]: 'two'

df['unique']=df.groupby('id')['type'].transform('nunique')

df.loc[(df.unique==1)&(df.type=='current'),:].id.unique().tolist()


Out[1085]: ['two']