在python中使用pandas映射列上匹配的字数
我有一个df在python中使用pandas映射列上匹配的字数,python,pandas,dataframe,data-analysis,Python,Pandas,Dataframe,Data Analysis,我有一个df Name Step Description Ram 1 Ram is oNe of the good cricketer Ram 2 gopal one Sri 1 Sri is one of the member Sri 2 ravi good Kumar 1 Kumar is a keeper Madhu 1 good boy Vignes
Name Step Description
Ram 1 Ram is oNe of the good cricketer
Ram 2 gopal one
Sri 1 Sri is one of the member
Sri 2 ravi good
Kumar 1 Kumar is a keeper
Madhu 1 good boy
Vignesh 1 oNe little
Pechi 1 one book
mario 1 good randokm
Roger 1 one milita good
bala 1 looks good
raj 1 more one
venk 1 likes good
和一份清单
my_list=["one","good"]
我正在尝试从我的_列表中获取至少有一个关键字的行
我试过了,
mask=df[“Description”].str.contains(“|”)
我得到了输出_df
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
我还想在单独的列中添加“描述”中的关键字及其计数
当df[“Name”]不是第一次出现时,即使“Description”包含一个关键字,也不应复制keys列中的关键字我想要的输出是
Name Step Description keys count
Ram 1 Ram is one of the good cricketer one,good 2
Ram 2 gopal one
Sri 1 Sri is one of the member one 1
Sri 2 ravi good
Kumar 1 Kumar is a keeper
Madhu 1 good boy good 1
Vignesh 1 oNe little oNe 1
Pechi 1 one book one 1
mario 1 good randokm good good 1
Roger 1 one milita good one,good 2
bala 1 looks good good 1
raj 1 more one one 1
venk 1 likes good good 1
我想要的输出是
Name Step Description keys count
Ram 1 Ram is one of the good cricketer one,good 2
Ram 2 gopal one
Sri 1 Sri is one of the member one 1
Sri 2 ravi good
Kumar 1 Kumar is a keeper
Madhu 1 good boy good 1
Vignesh 1 oNe little oNe 1
Pechi 1 one book one 1
mario 1 good randokm good good 1
Roger 1 one milita good one,good 2
bala 1 looks good good 1
raj 1 more one one 1
venk 1 likes good good 1
创建新的遮罩并应用它:
my_list=["one","good"]
mask=df["Description"].str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0 True
1 False
2 True
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
dtype: bool
编辑:
我不想考虑步骤栏,我想在“名称”栏上应用逻辑。当名称值首次出现时。正如你在索引=1中看到的那样,RAM发生了第二次,所以我们不应该考虑索引为1OK的行上的关键字,你认为
0
?`0 0 1 1 2 0 3 1 4 0`好的,但我做了,fillna(“”)
然后s=df.groupby('Name')['Description'])。转换(','.join)
。是否需要将其更改为s=df.groupby('Name')['Description'].transform(lambda x:','.join(x.astype(str)))
#transform all values if need same size of original
s = df.groupby('Name')['Description'].transform(','.join)
print (s)
0 Ram is oNe of the good cricketer,gopal one
1 Ram is oNe of the good cricketer,gopal one
2 Sri is one of the member,ravi good
3 Sri is one of the member,ravi good
4 Kumar is a keeper
5 good boy
6 oNe little
7 one book
8 good randokm good
9 one milita good
10 looks good
11 more one
12 likes good
Name: Description, dtype: object
#for mask use new Series s
mask=s.str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0 True
1 False
2 True
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
dtype: bool
#extract from new Series s
extracted = s.str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE).apply(set)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
Name Step Description keys count
0 Ram 1 Ram is oNe of the good cricketer good,oNe,one 3.0
1 Ram 2 gopal one NaN NaN
2 Sri 1 Sri is one of the member good,one 2.0
3 Sri 2 ravi good NaN NaN
4 Kumar 1 Kumar is a keeper NaN NaN
5 Madhu 1 good boy good 1.0
6 Vignesh 1 oNe little oNe 1.0
7 Pechi 1 one book one 1.0
8 mario 1 good randokm good good 1.0
9 Roger 1 one milita good good,one 2.0
10 bala 1 looks good good 1.0
11 raj 1 more one one 1.0
12 venk 1 likes good good 1.0