Python过滤和groupby_Python_Csv_Pandas

Python过滤和groupby

python csv pandas

Python过滤和groupby,python,csv,pandas,Python,Csv,Pandas,我有一个csv在熊猫中工作-前十排 print frame1.head(10) alert Subject filetype type country status 0 33965790 44676 aba Attachment doc RU,RU,RU,RU deleted 1 33965786 44676 rcrump Attachment zip NaN deleted 2 33965

我有一个csv在熊猫中工作-前十排

print frame1.head(10)

      alert         Subject    filetype type      country   status
0  33965790    44676 aba     Attachment  doc  RU,RU,RU,RU  deleted
1  33965786    44676 rcrump  Attachment  zip          NaN  deleted
2  33965771            3aba  Attachment  zip          NaN  deleted
3  33965770             NaN  Attachment   js           ,,  deleted
4  33965766             NaN  Attachment   js           ,,  deleted
5  33965761             NaN  Attachment  zip          NaN  deleted
6  33965760             NaN  Attachment  zip          NaN  deleted
7  33965757             NaN  Attachment  zip          NaN  deleted
8  33965751  35200     3aba  Attachment  doc     RU,RU,RU  deleted
9  33965747  35200   INVaba  Attachment  zip          NaN  deleted

我需要获取主题列并计算所有以“aba”作为子字符串的行

Occurrences of aba- 512

甚至是这样的结果

aba    12
3aba   5
INVaba 2

这是我的密码-

targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject')
print (targeted.to_string(header=False))

获取错误-AttributeError:无法访问“DataFrameGroupBy”对象的可调用属性“to_string”，请尝试使用“apply”方法

*****注意：我在前面得到了一些不同的文件类型，这很有效-

filetype = frame1.groupby('filetype').size()
###clean up the printing
print "Delivered in Email"
print (filetype.to_string(header=False))

给了我-

Delivered in Email
Attachment    32647
Header          131
URL            9236

对于您建议的第一个输出，您可以执行以下操作：

containts_aba = frame1[frame1['Subject'].str.contains('aba', case=False)
print("Occurrences of aba-",len(contains_aba))

它会根据您的条件创建另一个数据帧，然后该数据帧的长度将是出现的次数，因此您可以直接打印该数据帧。

要获得完整计数，只需使用后跟

然后，要获取包含

'aba'

的唯一字符串的计数，您可以访问

包含的那些值，然后使用
给予
3aba      1
INVaba    1
aba       1

>>> df.loc[df.Subject.str.contains('aba', case=False, na=False), 'Subject'].value_counts()

3aba      1
INVaba    1
aba       1
Name: Subject, dtype: int64

targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject').size()
print (targeted.to_string(header=False))

3aba      1
INVaba    1
aba       1