Pandas 筛选DataFrame以仅显示包含字符串列表中所有字符串的行
如果我们有一个数据帧:Pandas 筛选DataFrame以仅显示包含字符串列表中所有字符串的行,pandas,filter,Pandas,Filter,如果我们有一个数据帧: Column1 Column2 0 Alpha This is bananas 1 Bravo This is not 2 Charlie This is not bananas 3 Delta This is not a banana 4 Echo This is not a Banana 5 Foxtrot This is not a banananananana Column1 Column2 0 Del
Column1 Column2
0 Alpha This is bananas
1 Bravo This is not
2 Charlie This is not bananas
3 Delta This is not a banana
4 Echo This is not a Banana
5 Foxtrot This is not a banananananana
Column1 Column2
0 Delta 'This is not a banana'
1 Echo 'This is not a Banana'
我们只想从字符串列表中选择包含所有字符串的行,我们如何创建一个函数来过滤它?不区分大小写
例如,如果我想专门针对['not','banana']
进行筛选,我可以将该列表放入函数中,它应该返回以下数据帧:
Column1 Column2
0 Alpha This is bananas
1 Bravo This is not
2 Charlie This is not bananas
3 Delta This is not a banana
4 Echo This is not a Banana
5 Foxtrot This is not a banananananana
Column1 Column2
0 Delta 'This is not a banana'
1 Echo 'This is not a Banana'
基本要求:
一种方法是使用集合 大小写折叠并将字符串拆分为一个单词列表,然后再拆分为一组
>>> df.Column2.str.casefold().str.split().map(set)
0 {bananas, this, is}
1 {not, this, is}
2 {not, bananas, this, is}
3 {is, this, not, banana, a}
4 {is, this, not, banana, a}
5 {is, this, banananananana, not, a}
Name: Column2, dtype: object
然后你可以检查你的话是否正确
a
您可以将其用于pandas.Series.str.contains()
欢迎来到Stackoverflow!非常好的答案+1
>>> import re
>>> pattern = '(?i)' + ''.join(f'(?=.*(^|\s){re.escape(word)}(\s|$))' for word in words)
>>> pattern
'(?i)(?=.*(^|\\s)not(\\s|$))(?=.*(^|\\s)banana(\\s|$))'
>>> df[ df.Column2.str.contains(pattern) ]
Column1 Column2
3 Delta This is not a banana
4 Echo This is not a Banana