Python 基于另一列中的值将正则表达式应用于dataframe列_Python_Regex_Pandas_Dataframe

Python 基于另一列中的值将正则表达式应用于dataframe列

python regex pandas dataframe

Python 基于另一列中的值将正则表达式应用于dataframe列,python,regex,pandas,dataframe,Python,Regex,Pandas,Dataframe,下面有一个regex_func助手函数，它可以很好地使用map和lambda从df列中提取匹配项 def regex_func(regex_compile,x,item=0,return_list=False): """Function to handle list returned by re.findall() Takes the first value of the list. If empty list, returns empty string"

下面有一个regex_func助手函数，它可以很好地使用map和lambda从df列中提取匹配项

def regex_func(regex_compile,x,item=0,return_list=False):
    """Function to handle list returned by re.findall()
        Takes the first value of the list.
        If empty list, returns empty string"""
    match_list = regex_compile.findall(x)
    if return_list:
        match = match_list
    elif match_list:
        try:
            match = match_list[item]
        except:
             match = ""
    else:
        match = ""
    return match

#Working example
regex_1 = re.compile('(?i)(?<=\()[^ ()]+')
df['colB'] = df['colA'].map(lambda x: regex_func(regex_1, x))

我在做类似的任务时遇到困难。我希望正则表达式基于另一列中的值，然后应用。我尝试的一种方法不起作用：

# Regex should be based on value in col1
# Extracting that value and prepping to input into my regex_func()
value_list = df['col1'].tolist()
value_list = ['(?i)(?<=' + d + ' )[^ ]+' for d in value_list]
value_list =  [re.compile(d) for d in value_list]
# Adding prepped list back into df as col2
df.insert(1,'col2',value_list)
#Trying to create col4, based on applying my re.compile in col 2 to a value in col3.
df.insert(2,'col4', df['col3'].map(lambda x: df['col2'],x)

我理解上述方法不起作用的原因，但尚未找到解决方案。

您可以压缩列，然后动态构建正则表达式：

df['colB'] = [regex_func('(?i)(?<=' + y + ' )[^ ]+', x)
              for x, y in zip(df['colA'], df['col1'])]

你能提供一些样本数据吗？否则就看不出它为什么不起作用了……看来客人的回答对我起作用了！谢天谢地，看来这对我有用。我添加的唯一内容是围绕我传递给regex_func的内容进行重新编译。非常感谢。