Python 如何将字符变量与另一个变量中定义的正则表达式相匹配？_Python_Regex_Pandas

Python 如何将字符变量与另一个变量中定义的正则表达式相匹配？

python regex pandas

Python 如何将字符变量与另一个变量中定义的正则表达式相匹配？,python,regex,pandas,Python,Regex,Pandas,考虑这个简单的例子 import pandas as pd mydata = pd.DataFrame({'mystring' : ['heLLohelloy1', 'hAllohallo'], 'myregex' : ['hello.[0-9]', 'ulla']}) mydata Out[3]: myregex mystring 0 hello.[0-9] heLLohelloy1 1 ulla

考虑这个简单的例子

import pandas as pd

mydata = pd.DataFrame({'mystring' : ['heLLohelloy1', 'hAllohallo'],
                       'myregex' : ['hello.[0-9]', 'ulla']})

mydata
Out[3]: 
       myregex      mystring
0  hello.[0-9]  heLLohelloy1
1         ulla    hAllohallo

我想创建一个变量

flag

，用于标识

mystring

与同一行

myregex

中的正则表达式匹配的行

也就是说，在示例中，只有第一行

heLLohelloy1

与正则表达式

hello.[0-9]

匹配。实际上，

hAllohallo

与regex

ulla

不匹配

我如何在熊猫身上尽可能有效地做到这一点？这里我们讨论的是数以百万计的观测数据（数据仍然适合RAM）。

我提出了这个解决方案，你能检查一下你方是否满足你的要求吗

[pd.Series(y).str.contains(x)[0] for x,y in zip(mydata.myregex,mydata.mystring)]

Out[54]: [True, False]

或者我们使用

map

list(map(lambda x: pd.Series(x[1]).str.contains(x[0])[0], zip(mydata.myregex,mydata.mystring)))
Out[56]: [True, False]

我提出了这个解决方案，你能帮我检查一下是否符合你的要求吗

[pd.Series(y).str.contains(x)[0] for x,y in zip(mydata.myregex,mydata.mystring)]

Out[54]: [True, False]

或者我们使用

map

list(map(lambda x: pd.Series(x[1]).str.contains(x[0])[0], zip(mydata.myregex,mydata.mystring)))
Out[56]: [True, False]

您可以使用

re库

和

apply函数

执行以下操作：

import re

# apply function
mydata['flag'] = mydata.apply(lambda row: bool(re.search(row['myregex'], row['mystring'])), axis=1)

### to convert bool to int - optional
### mydata['flag'] = mydata['flag'].astype(int)

       myregex      mystring    flag
0   hello.[0-9] heLLohelloy1    True
1   ulla        hAllohallo      False

您可以使用

re库

和

apply函数

执行以下操作：

import re

# apply function
mydata['flag'] = mydata.apply(lambda row: bool(re.search(row['myregex'], row['mystring'])), axis=1)

### to convert bool to int - optional
### mydata['flag'] = mydata['flag'].astype(int)

       myregex      mystring    flag
0   hello.[0-9] heLLohelloy1    True
1   ulla        hAllohallo      False

谢谢你，但是你怎么能在数据框中基于这个创建

flag

变量（True，False）？

df['flag']=list（map（lambda x:pd.Series（x[1]）.str.contains（x[0]）[0]，zip（mydata.myregex，mydata.mystring））

分配回来@ℕʘʘḆḽḘ谢谢你，但是你怎么能在数据框中基于这个创建

flag

变量（True，False）？

df['flag']=list（map（lambda x:pd.Series（x[1]）.str.contains（x[0]）[0]，zip（mydata.myregex，mydata.mystring））

分配回来@ℕʘʘḆḽḘ好极了。我想知道哪个正则表达式包在这里会更快。

re

是最好的吗？：）我不知道如何做到最好，但是

re

无疑是解决正则表达式问题最广泛使用和开发最完善的python库。我想知道哪个正则表达式包在这里会更快。

re

是最好的吗？：）我不知道如何做到最好，但是

re

无疑是解决正则表达式问题使用最广泛、开发最完善的python库。