Python 当字典的键匹配时，如何从列中提取字符串_Python_Pandas_Lambda_Apply

Python 当字典的键匹配时，如何从列中提取字符串

python pandas lambda

Python 当字典的键匹配时，如何从列中提取字符串,python,pandas,lambda,apply,Python,Pandas,Lambda,Apply,我有这样的数据框： **Domain** **URL** Amazon amazon.com/xyz/butter Amazon amazon.com/xyz/orange Facebook facebook.com/male Google google.com/airport Google goolge.com/car dict_keyword = {'Facebook': ['boy', 'gi

我有这样的数据框：

**Domain**         **URL**  
Amazon         amazon.com/xyz/butter
Amazon         amazon.com/xyz/orange
Facebook       facebook.com/male
Google         google.com/airport
Google         goolge.com/car

dict_keyword = {'Facebook': ['boy', 'girl', 'man'], 'Google': ['airport', 'car', 'konfigurator'], 'Amazon': ['apple', 'orange', 'butter']

  **Domain**         **URL**                     Keyword
    Amazon         amazon.com/xyz/butter         butter
    Amazon         amazon.com/xyz/orange         orange
    Facebook       facebook.com/male             male
    Google         google.com/airport            airport
    Google         goolge.com/car                car

这只是一个虚构的数据。我有clickstream数据，我想在其中使用“域”和“URL”列。事实上，我有许多关键字的列表，我保存在字典里，我需要在url中搜索它，然后提取它来创建新的列

我有这样的字典：

**Domain**         **URL**  
Amazon         amazon.com/xyz/butter
Amazon         amazon.com/xyz/orange
Facebook       facebook.com/male
Google         google.com/airport
Google         goolge.com/car

dict_keyword = {'Facebook': ['boy', 'girl', 'man'], 'Google': ['airport', 'car', 'konfigurator'], 'Amazon': ['apple', 'orange', 'butter']

  **Domain**         **URL**                     Keyword
    Amazon         amazon.com/xyz/butter         butter
    Amazon         amazon.com/xyz/orange         orange
    Facebook       facebook.com/male             male
    Google         google.com/airport            airport
    Google         goolge.com/car                car

我希望获得如下输出：

**Domain**         **URL**  
Amazon         amazon.com/xyz/butter
Amazon         amazon.com/xyz/orange
Facebook       facebook.com/male
Google         google.com/airport
Google         goolge.com/car

dict_keyword = {'Facebook': ['boy', 'girl', 'man'], 'Google': ['airport', 'car', 'konfigurator'], 'Amazon': ['apple', 'orange', 'butter']

  **Domain**         **URL**                     Keyword
    Amazon         amazon.com/xyz/butter         butter
    Amazon         amazon.com/xyz/orange         orange
    Facebook       facebook.com/male             male
    Google         google.com/airport            airport
    Google         goolge.com/car                car

到目前为止，我只想用一行代码。我正在尝试使用

df['Keyword'] = df.apply(lambda x: any(substring in x.URL for substring in dict_config[x.Domain]) ,axis =1)

我只得到布尔值，但我想返回关键字。有什么帮助吗？

的想法是在列表理解的末尾添加过滤，如果不匹配，还添加了

next

和

iter

，以返回默认值：

f = lambda x: next(iter([sub for sub in dict_config[x.Domain] if sub in x.URL]), 'no match')
df['Keyword'] = df.apply(f, axis=1)
print (df)
     Domain                    URL   Keyword
0    Amazon  amazon.com/xyz/butter    butter
1    Amazon  amazon.com/xyz/orange    orange
2  Facebook      facebook.com/male  no match
3    Google     google.com/airport   airport
4    Google         goolge.com/car       car

如果可能，也不匹配第一个

域

列解决方案将更改为

。获取

以使用默认值进行查找：

print (df)
     Domain                    URL
0    Amazon  amazon.com/xyz/butter
1    Amazon  amazon.com/xyz/orange
2  Facebook      facebook.com/male
3    Google     google.com/airport
4   Google1         goolge.com/car <- changed last value to Google1

dict_config = {'Facebook': ['boy', 'girl', 'man'], 
               'Google': ['airport', 'car', 'konfigurator'],
               'Amazon': ['apple', 'orange', 'butter']}

f = lambda x: next(iter([sub for sub in dict_config.get(x.Domain, '') 
                         if sub in x.URL]), 'no match')
df['Keyword'] = df.apply(f, axis=1)
     Domain                    URL   Keyword
0    Amazon  amazon.com/xyz/butter    butter
1    Amazon  amazon.com/xyz/orange    orange
2  Facebook      facebook.com/male  no match
3    Google     google.com/airport   airport
4   Google1         goolge.com/car  no match

打印（df）
域URL
0 Amazon.com/xyz/butter
1 Amazon.com/xyz/orange
2 Facebook.com/male
3 Google.com/airport
4 Google1 goolge.com/car你永远是救命稻草。惊人@jezrael你能提供最好的资料来源，让我可以了解更多关于下一步，iter，lambda操作的信息吗？@MuhammadSalmanShahid-这是更纯粹的python方式，所以你可以检查一下。