Python 如何根据作为输入参数传递给数据帧列的基于unix的正则表达式筛选行_Python_Pandas

Python 如何根据作为输入参数传递给数据帧列的基于unix的正则表达式筛选行

python pandas

Python 如何根据作为输入参数传递给数据帧列的基于unix的正则表达式筛选行,python,pandas,Python,Pandas,我有以下数据框 import numpy as np import pandas as pd import os csvFile = "csv.csv" csvDelim = '@@@' df = pd.read_csv(csvFile, engine="python", index_col=False, delimiter= csvDelim) df.head() ID col_1 0 ACLKB 1 CLKAA 2 AACLK 3 BBBCLK 要传递的正则表

我有以下数据框

import numpy as np
import pandas as pd
import os

csvFile = "csv.csv"
csvDelim = '@@@'
df = pd.read_csv(csvFile, engine="python", index_col=False, delimiter= csvDelim)
df.head()


ID  col_1   
0   ACLKB
1   CLKAA
2   AACLK
3   BBBCLK

要传递的正则表达式为CLK，列名为“col_1”

text = '*CLK*'
findtext = 'r'+text+".*"
colName = 'Signal'

df[colName].str.match(text)

我得到了以下不正确的结果

 0     False
 1     False
 2     False
 3     False
 4     False
The expected output is  
 0     True
 1     True
 2     True
 3     True
 4     True

 Can someone help me to filter rows based on regular expression passed as above  
         error                                     Traceback (most recent call last)
        <ipython-input-110-8d1c1b6b2d15> in <module>()
     ----> 1 df['Signal'].str.match(findtext)

              ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py in match(self, pat, case, flags, na, as_indexer)
              1571     def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None):
              1572         result = str_match(self._data, pat, case=case, flags=flags, na=na,
        ->    1573                            as_indexer=as_indexer)
              1574         return self._wrap_result(result)
               1575 

            ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py in str_match(arr, pat, case, flags, na, as_indexer)
       495         flags |= re.IGNORECASE
       496 
    --> 497     regex = re.compile(pat, flags=flags)
      498 
      499     if (as_indexer is False) and (regex.groups > 0):

     ~\AppData\Local\Continuum\anaconda3\lib\re.py in compile(pattern, flags)
     231 def compile(pattern, flags=0):
     232     "Compile a regular expression pattern, returning a pattern object."
  --> 233     return _compile(pattern, flags)
     234 
     235 def purge():

  ~\AppData\Local\Continuum\anaconda3\lib\re.py in _compile(pattern, flags)
   299     if not sre_compile.isstring(pattern):
   300         raise TypeError("first argument must be string or compiled pattern")

-->562 p=sre_parse.parse（p，标志） 563其他： 564模式=无

    ~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in parse(str, flags, pattern)
    853 
    854     try:

-->855 p=_parse_sub（源、模式、标志和SRE_标志详细，0） 856除冗长外： 857#模式内的详细标志已打开。将来

  ~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in _parse_sub(source, state, verbose, nested)
     414     while True:
     415         itemsappend(_parse(source, state, verbose, nested + 1,

-->416非嵌套和非项目） 417如果不是sourcematch（“|”）： 418中断

    ~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in _parse(source, state, verbose, nested, first)
     614             if not item or (_len(item) == 1 and item[0][0] is AT):
     615                 raise source.error("nothing to repeat",

-->616 source.tell（）-here+len（this）） 617如果重复代码中的[0][0]项： 618引发源错误（“多次重复”

此外，正则表达式也可以是^CLK或？CLK或任何其他正则表达式表达式当任何字符串

正则表达式已通过

我相信您需要删除下面代码第二行中的“r”：

text = '*CLK*'
findtext = 'r'+text+".*"
colName = 'Signal'

看起来您正在尝试生成python原始字符串，如果您使用的是python3或更高版本，那么这是不必要的

另外，您正在使用的正则表达式不适合您所需的正则表达式，请尝试以下操作，您可以尝试尝试构造您所需的正则表达式 findtext='.*CLK'

删除星号（*），并使用

.contains

方法而不是

.match

方法。使用

case=False

查找大小写字母

请参阅此代码：

text = 'CLK'
findtext = 'r'+text+".*"
colName = 'Signal'

df[colName].str.contains(text, case=False)

它不工作，我收到一个错误。请尝试告诉我，如果您收到错误，您需要发布它，以便我们能够响应。请记住，这不是个人帮助台。我们如何获得字符串包含任何正则表达式的通用解决方案，例如CLK？CLK*^CLK但此文本传递了一个函数参数，因此无法响应从中删除*使用

text=text.strip（“*”）

删除*我已经编辑了问题，如果字符串是正则表达式，请建议一个通用解决方案。例如？CLK，^CLKAny注释将有助于通用解决任何正则表达式

text = '*CLK*'
findtext = 'r'+text+".*"
colName = 'Signal'

text = 'CLK'
findtext = 'r'+text+".*"
colName = 'Signal'

df[colName].str.contains(text, case=False)