Python 精确的单词匹配并在列中显示
我有以下数据帧(df) 还有列表中的一些单词,我需要搜索一个精确的匹配项Python 精确的单词匹配并在列中显示,python,string,pandas,dataframe,Python,String,Pandas,Dataframe,我有以下数据帧(df) 还有列表中的一些单词,我需要搜索一个精确的匹配项 word_list = ['look','be','him'] 这是我想要的输出 Comments ID Word_01 Word_02 Word_03 0 10 Looking for help 1 11 Look at him but be nice look
word_list = ['look','be','him']
这是我想要的输出
Comments ID Word_01 Word_02 Word_03
0 10 Looking for help
1 11 Look at him but be nice look be him
2 12 Be calm be
3 13 Being good
4 14 Him and Her him
5 15 Himself
我试过一些方法,比如str.findall
str.findall(r"\b" + '|'.join(word_list) + r"\b",flags = re.I)
还有一些其他的,但我似乎无法得到我的话的精确匹配
如能帮助解决此问题,我们将不胜感激
谢谢您可以使用pandas的
应用
功能。
例如:
产出:
Comments ID
0 10 Looking for help
1 11 Look at him but be nice
2 12 Be calm
3 13 Being good
4 14 Him and Her
5 15 Himself
Comments ID Word_0 Word_1 Word_2
0 10 Looking for help look None None
1 11 Look at him but be nice look be him
2 12 Be calm None be None
3 13 Being good None be None
4 14 Him and Her None None him
5 15 Himself None None him
每个单词都需要单词边界。一种可能的解决方案是使用原始的
数据帧
:
word_list = ['look','be','him']
pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = df['ID'].str.extractall('(' + pat + ')', flags = re.I)[0].unstack().add_prefix('Word_')
对于输出中的小写数据,请添加:
您的解决方案应按相同的模式进行更改,将值转换为
列表
s和连接
为原始值:
pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = (pd.DataFrame(df['ID']
.str.findall(pat, flags = re.I).values.tolist())
.add_prefix('Word_')
.fillna(''))
或使用列表理解(应该是最快的):
对于小写,添加.lower()
:
你说的“精确”匹配是什么意思?
str.findall
方法的哪部分结果不合适?这不是有效的Python。现在计算机上没有,很难找到正确的引号字符。如果不适合,请随意编辑我的帖子。太棒了。谢谢你的帮助
word_list = ['look','be','him']
pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = df['ID'].str.extractall('(' + pat + ')', flags = re.I)[0].unstack().add_prefix('Word_')
df1 = (df['ID'].str.lower()
.str.extractall('(' + pat + ')')[0]
.unstack()
.add_prefix('Word_'))
df = df.join(df1).fillna('')
print (df)
Comments ID Word_0 Word_1 Word_2
0 10 Looking for help
1 11 Look at him but be nice Look him be
2 12 Be calm Be
3 13 Being good
4 14 Him and Her Him
5 15 Himself
pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = (pd.DataFrame(df['ID']
.str.findall(pat, flags = re.I).values.tolist())
.add_prefix('Word_')
.fillna(''))
df1 = (pd.DataFrame([re.findall(pat, x, flags = re.I) for x in df['ID']])
.add_prefix('Word_')
.fillna(''))
pat = '|'.join(r"\b{}\b".format(x) for x in word_list)
df1 = (pd.DataFrame([re.findall(pat, x.lower(), flags = re.I) for x in df['ID']])
.add_prefix('Word_')
.fillna(''))