Python 无法使用正则表达式为表中的值集找到第一个出现的子字符串_Python_Python 3.x_Pandas_Substring_Findall

Python 无法使用正则表达式为表中的值集找到第一个出现的子字符串

python python-3.x pandas

Python 无法使用正则表达式为表中的值集找到第一个出现的子字符串,python,python-3.x,pandas,substring,findall,Python,Python 3.x,Pandas,Substring,Findall,我有一个数据帧，如下所示，我只需要找到一组值在字符串中的第一个匹配项我无法将“find”函数与regex和dictionary一起使用。若我使用“findall”函数，它当然会找到所有我不需要的事件 Text 51000/1-PLASTIC 150 Prange 51034/2-RUBBER KL 100 AA 51556/3-PAPER BD+CM 1 BOXT2 52345/1-FLOW IJ 10place 500 plastic 54975/1-DIVIDER PQR 100 BC

我有一个数据帧，如下所示，我只需要找到一组值在字符串中的第一个匹配项

我无法将“find”函数与regex和dictionary一起使用。若我使用“findall”函数，它当然会找到所有我不需要的事件

Text

51000/1-PLASTIC 150 Prange
51034/2-RUBBER KL 100 AA
51556/3-PAPER BD+CM 1 BOXT2
52345/1-FLOW IJ 10place 500 plastic
54975/1-DIVIDER PQR 100 BC
54975/1-SCALE DEF 555 AB Apple 
54975/1-PLASTIC ABC 4.6 BB plastic

代码：

预期结果：

Text                                                   Result

51000/1-PLASTIC 150 Prange                            Plastic
51034/2-RUBBER KL 100 AA                              Rubber
51556/3-PAPER BD+CM 1 BOXT2                           Paper
52345/1-FLOW IJ 10place 500 plastic                   Flow
54975/1-DIVIDER PQR 100 BC                            Not known
54975/1-SCALE DEF 555 AB Apple                        Not KNown 
54975/1-PLASTIC ABC 4.6 BB plastic                    Plastic

错误：

TypeError:find（）获得意外的关键字参数“flags”

通过索引

str[0]

，将

findall

返回的列表的选择第一个值改为

find

：

import re

L = ['PLASTIC','RUBBER','PAPER','FLOW']
pat = '|'.join(r"\b{}\b".format(x) for x in L)

df['Result'] = df['Text'].str.findall(pat, flags=re.I).str[0]

或使用：

然后将缺少的值转换为未知的

：
df['Result'] = df['Result'].fillna("Not known")

如有必要，最后使用：
您是否可以尝试将findall函数分配给一个变量&然后执行变量[0]？关于findall（regex，string）[0]，添加[0]以指示值集中字符串的第一个元素。简单的
df['Result'] = df['Text'].str.extract('(' + pat + ')', flags=re.I)

df['Result'] = df['Result'].fillna("Not known")

df['Result'] = df['Result'].str.capitalize()
print (df)
                                   Text     Result
0            51000/1-PLASTIC 150 Prange    Plastic
1              51034/2-RUBBER KL 100 AA     Rubber
2           51556/3-PAPER BD+CM 1 BOXT2      Paper
3   52345/1-FLOW IJ 10place 500 plastic       Flow
4            54975/1-DIVIDER PQR 100 BC  Not known
5        54975/1-SCALE DEF 555 AB Apple  Not known
6  54975/1-PLASTIC ABC 4.6 BB plastic      Plastic