Python 在数据帧中查找多个字典键&；为匹配项返回多个值_Python_Python 3.x_Pandas_Dictionary_String Matching

Python 在数据帧中查找多个字典键&；为匹配项返回多个值

python python-3.x pandas dictionary

Python 在数据帧中查找多个字典键&；为匹配项返回多个值,python,python-3.x,pandas,dictionary,string-matching,Python,Python 3.x,Pandas,Dictionary,String Matching,第一次发布，如果我的格式设置被关闭，请提前道歉这是我的问题：我创建了一个包含多行文本的熊猫数据框： d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']} keywords = pd.DataFrame(d,columns=['keywords']) In [7]: keywords Out[7]: keywords 0 cheap shoes 1 luxury shoes 2 ch

第一次发布，如果我的格式设置被关闭，请提前道歉

这是我的问题：

我创建了一个包含多行文本的熊猫数据框：

d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']}
keywords = pd.DataFrame(d,columns=['keywords'])
In [7]: keywords
Out[7]:
        keywords
0  cheap shoes
1  luxury shoes
2  cheap hiking shoes

现在我有了一个包含以下键/值的字典：

labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport'}

我想做的是找出数据帧中是否存在字典中的键，如果存在，则返回适当的值

我可以通过以下方式达到目的：

for k,v in labels.items():
   keywords['Labels'] = np.where(keywords['keywords'].str.contains(k),v,'No Match')

但是，输出缺少前两个键，仅捕获最后一个“爬坡”键

此外，我还想知道是否有一种方法可以捕获字典中由|分隔的多个值，因此理想的输出如下所示

    keywords            Labels
0   cheap shoes         budget
1   luxury shoes        expensive
2   cheap hiking shoes  budget | sport

非常感谢您的帮助或指导

干杯

当然有可能。这里有一个方法

d = {'keywords': ['cheap shoes', 'luxury shoes', 'cheap hiking shoes', 'nothing']}

keywords = pd.DataFrame(d,columns=['keywords'])

labels = {'cheap': 'budget', 'luxury': 'expensive', 'hiking': 'sport'}

df = pd.DataFrame(d)

def matcher(k):
    x = (i for i in labels if i in k)
    return ' | '.join(map(labels.get, x))

df['values'] = df['keywords'].map(matcher)

#              keywords          values
# 0         cheap shoes          budget
# 1        luxury shoes       expensive
# 2  cheap hiking shoes  budget | sport
# 3             nothing

您可以使用

“|”。.join（labels.keys（））

获取要由

re.findall（）

使用的模式

坚持你的方法，你可以做到

arr = np.array([np.where(keywords['keywords'].str.contains(k), v, 'No Match') for k, v in labels.items()]).T
keywords["Labels"] = ["|".join(set(item[ind if ind.sum() == ind.shape[0] else ~ind])) for item, ind in zip(arr, (arr == "No Match"))]

Out[97]: 
             keywords        Labels
0         cheap shoes        budget
1        luxury shoes     expensive
2  cheap hiking shoes  sport|budget

您可以将字符串分为单独的列，然后分为多个索引，这样您就可以将标签字典中的值。然后是初始索引，以及属于每个索引的字符串

keywords['Labels'] = keywords.keywords.str.split(expand=True).stack()\
                     .map(labels).groupby(level=0)\
                     .apply(lambda x: x.str.cat(sep=' | '))



            keywords          Labels
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport

我喜欢先使用

replace

然后再查找值的想法

keywords.assign(
    values=
    keywords.keywords.replace(labels, regex=True)
            .str.findall(f'({"|".join(labels.values())})')
            .str.join(' | ')
)

             keywords          values
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport

上述方法非常有效，但需要一个后续步骤。如何编辑以上内容以仅捕获精确匹配？例如，如果标签更新为包含“cheape”：“budget”，并且第一个关键字更新为“cheapest-cheape shoes”。运行上述脚本将生成budget | budget作为“最便宜鞋”的值。字典可能会增长，以容纳更多绑定到单个值的单词变体。您应该使用集合，例如，如果i in k，则返回{labels[i]for i in labels in i}上述方法对于从最终结果中删除重复值非常有效。不过，我还是在部分匹配中遇到了一个奇怪的错误。假设我在关键字

d={'keywords'：['cheapest-cheaple shoes'，'luxy shoes'，'cheaple-moving shoes'，'liverpool']}

和'pool'中添加liverpool'到标签

labels={'cheape'：'budget'，'budget'，'luxury'：'private'，'moving'，'sport'，'sport'，'pool'：'sweering}keywords['Labels'] = keywords.keywords.str.split(expand=True).stack()\
                     .map(labels).groupby(level=0)\
                     .apply(lambda x: x.str.cat(sep=' | '))



            keywords          Labels
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport


keywords.assign(
    values=
    keywords.keywords.replace(labels, regex=True)
            .str.findall(f'({"|".join(labels.values())})')
            .str.join(' | ')
)

             keywords          values
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport