Pandas 筛选DataFrame以仅显示包含字符串列表中所有字符串的行

Pandas 筛选DataFrame以仅显示包含字符串列表中所有字符串的行,pandas,filter,Pandas,Filter,如果我们有一个数据帧: Column1 Column2 0 Alpha This is bananas 1 Bravo This is not 2 Charlie This is not bananas 3 Delta This is not a banana 4 Echo This is not a Banana 5 Foxtrot This is not a banananananana Column1 Column2 0 Del

如果我们有一个数据帧:

   Column1  Column2
0  Alpha    This is bananas
1  Bravo    This is not
2  Charlie  This is not bananas
3  Delta    This is not a banana
4  Echo     This is not a Banana
5  Foxtrot  This is not a banananananana
   Column1  Column2
0  Delta    'This is not a banana'
1  Echo     'This is not a Banana'
我们只想从字符串列表中选择包含所有字符串的行,我们如何创建一个函数来过滤它?不区分大小写

例如,如果我想专门针对
['not','banana']
进行筛选,我可以将该列表放入函数中,它应该返回以下数据帧:

   Column1  Column2
0  Alpha    This is bananas
1  Bravo    This is not
2  Charlie  This is not bananas
3  Delta    This is not a banana
4  Echo     This is not a Banana
5  Foxtrot  This is not a banananananana
   Column1  Column2
0  Delta    'This is not a banana'
1  Echo     'This is not a Banana'
基本要求:

  • Column2必须包含给定字符串列表(任意长度)中的所有字符串。我希望能够搜索1、2、3、5、10的列表,不管我想要多少字符串
  • 不区分大小写(为什么筛选“banana”会给出带有“banana”和“banana”的行)
  • 忽略包含额外字母的结果。过滤“香蕉”时,不会选择带有“香蕉”或“香蕉”或“香蕉”的行

  • 一种方法是使用集合

    大小写折叠并将字符串拆分为一个单词列表,然后再拆分为一组

    >>> df.Column2.str.casefold().str.split().map(set)
    0                   {bananas, this, is}
    1                       {not, this, is}
    2              {not, bananas, this, is}
    3            {is, this, not, banana, a}
    4            {is, this, not, banana, a}
    5    {is, this, banananananana, not, a}
    Name: Column2, dtype: object
    
    然后你可以检查你的话是否正确 a

    您可以将其用于pandas.Series.str.contains()


    欢迎来到Stackoverflow!非常好的答案+1
    >>> import re
    >>> pattern = '(?i)' + ''.join(f'(?=.*(^|\s){re.escape(word)}(\s|$))' for word in words)
    >>> pattern
    '(?i)(?=.*(^|\\s)not(\\s|$))(?=.*(^|\\s)banana(\\s|$))'
    
    >>> df[ df.Column2.str.contains(pattern) ]
      Column1               Column2
    3   Delta  This is not a banana
    4    Echo  This is not a Banana