Python 查找数据帧中最频繁的字符串
我是Python编程新手。我有一个熊猫数据框,其中有两个字符串列 数据帧如下所示:Python 查找数据帧中最频繁的字符串,python,string,pandas,nlp,Python,String,Pandas,Nlp,我是Python编程新手。我有一个熊猫数据框,其中有两个字符串列 数据帧如下所示: Case Action Create Create New Account Create New Account Create New Account Create New Account Create Old Account Delete Delete New Account Delete New Accou
Case Action
Create Create New Account
Create New Account
Create New Account
Create New Account
Create Old Account
Delete Delete New Account
Delete New Account
Delete Old Account
Delete Old Account
Delete Old Account
在这里,我们可以看到在Create
中,有5项操作和4项操作被Create New Account
。表示4/5(=80%)。类似地,在Delete
案例中,最大案例为Delete Old Account
。因此,我的要求是,当下一次出现类似于创建
的情况时,我应该以的方式获得o/p,并使用频率分数创建新帐户
预期O/p:
Case Action Score
Create Create New Account 80
Delete Delete Old Account 60
在groupby
tail
pd.crosstab(df.Case,df.Action,normalize='index').stack().sort_values().groupby(level=0).tail(1)
Out[769]:
Case Action
Delete DeleteOldAccount 0.6
Create CreateNewAccount 0.8
dtype: float64
或者使用where
pdf=pd.crosstab(df.Case,df.Action,normalize='index')
pdf.where(pdf.eq(pdf.max(1),axis=0)).stack()
Out[781]:
Case Action
Create CreateNewAccount 0.8
Delete DeleteOldAccount 0.6
dtype: float64