Python 查找';字符串';在子组列中,并根据其出现情况标记maingroup
我有如下数据:Python 查找';字符串';在子组列中,并根据其出现情况标记maingroup,python,pandas,numpy,group-by,pandas-groupby,Python,Pandas,Numpy,Group By,Pandas Groupby,我有如下数据: Group string A Hello A SearchListing A GoSearch A pen A Hello B Real-Estate B Access B Denied B Group B Group C Glance C NoSearch C Home Group containsSearch TotalStrings
Group string
A Hello
A SearchListing
A GoSearch
A pen
A Hello
B Real-Estate
B Access
B Denied
B Group
B Group
C Glance
C NoSearch
C Home
Group containsSearch TotalStrings UniqueStrings NoOfTimesSearch
A 1 5 4 2
B 0 5 4 0
C 1 3 3 1
等等
我想找出字符串中有“搜索”短语的所有组,并将它们标记为0/1。同时,我希望聚合每个组的结果,如唯一字符串和总字符串,以及该组遇到“搜索”的次数。我想要的最终结果是这样的:
Group string
A Hello
A SearchListing
A GoSearch
A pen
A Hello
B Real-Estate
B Access
B Denied
B Group
B Group
C Glance
C NoSearch
C Home
Group containsSearch TotalStrings UniqueStrings NoOfTimesSearch
A 1 5 4 2
B 0 5 4 0
C 1 3 3 1
我可以使用一个简单的groupby子句进行聚合,但是我在如何根据“search”的存在以及遇到的次数将组标记为0/1方面遇到了问题。让我们试试:
l1 = lambda x: x.str.lower().str.contains('search').any().astype(int)
l1.__name__ = 'containsSearch'
l2 = lambda x: x.str.lower().str.contains('search').sum().astype(int)
l2.__name__ = 'NoOfTimesSEarch'
df.groupby('Group')['string'].agg(['count','nunique',l1,l2]).reset_index()
输出:
Group count nunique containsSearch NooOfTimesSEarch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
Group count nunique conatinsSearch NoOfTimesSearch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
或者使用定义的函数,谢谢,@W-B:
def conatinsSearch(x):
return x.str.lower().str.contains('search').any().astype(int)
def NoOfTimesSearch(x):
return x.str.lower().str.contains('search').sum().astype(int)
df.groupby('Group')['string'].agg(['count', 'nunique',
conatinsSearch, NoOfTimesSearch]).reset_index()
输出:
Group count nunique containsSearch NooOfTimesSEarch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
Group count nunique conatinsSearch NoOfTimesSearch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
让我们试试:
l1 = lambda x: x.str.lower().str.contains('search').any().astype(int)
l1.__name__ = 'containsSearch'
l2 = lambda x: x.str.lower().str.contains('search').sum().astype(int)
l2.__name__ = 'NoOfTimesSEarch'
df.groupby('Group')['string'].agg(['count','nunique',l1,l2]).reset_index()
输出:
Group count nunique containsSearch NooOfTimesSEarch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
Group count nunique conatinsSearch NoOfTimesSearch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
或者使用定义的函数,谢谢,@W-B:
def conatinsSearch(x):
return x.str.lower().str.contains('search').any().astype(int)
def NoOfTimesSearch(x):
return x.str.lower().str.contains('search').sum().astype(int)
df.groupby('Group')['string'].agg(['count', 'nunique',
conatinsSearch, NoOfTimesSearch]).reset_index()
输出:
Group count nunique containsSearch NooOfTimesSEarch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
Group count nunique conatinsSearch NoOfTimesSearch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
如果要创建函数,请执行以下操作:
def my_agg(x):
names = {
'containsSearch' : int(x['string'].str.lower().str.contains('search').any()),
'TotalStrings' : x['string'].count(),
'UniqueStrings' : x['string'].drop_duplicates().count(),
'NoOfTimesSearch' : int(x[x['string'].str.lower().str.contains('search')].count())
}
return pd.Series(names)
df.groupby('Group').apply(my_agg)
containsSearch TotalStrings UniqueStrings NoOfTimesSearch
Group
A 1 5 4 2
B 0 5 4 0
C 1 3 3 1
如果要创建函数,请执行以下操作:
def my_agg(x):
names = {
'containsSearch' : int(x['string'].str.lower().str.contains('search').any()),
'TotalStrings' : x['string'].count(),
'UniqueStrings' : x['string'].drop_duplicates().count(),
'NoOfTimesSearch' : int(x[x['string'].str.lower().str.contains('search')].count())
}
return pd.Series(names)
df.groupby('Group').apply(my_agg)
containsSearch TotalStrings UniqueStrings NoOfTimesSearch
Group
A 1 5 4 2
B 0 5 4 0
C 1 3 3 1
没有理由。我想这样会更好。我得到一个错误:“'bool'对象没有属性'astype'”没有理由。我想这样会更好。我得到一个错误:“'bool'对象没有属性'astype'”