Python 当字典的键匹配时,如何从列中提取字符串
我有这样的数据框:Python 当字典的键匹配时,如何从列中提取字符串,python,pandas,lambda,apply,Python,Pandas,Lambda,Apply,我有这样的数据框: **Domain** **URL** Amazon amazon.com/xyz/butter Amazon amazon.com/xyz/orange Facebook facebook.com/male Google google.com/airport Google goolge.com/car dict_keyword = {'Facebook': ['boy', 'gi
**Domain** **URL**
Amazon amazon.com/xyz/butter
Amazon amazon.com/xyz/orange
Facebook facebook.com/male
Google google.com/airport
Google goolge.com/car
dict_keyword = {'Facebook': ['boy', 'girl', 'man'], 'Google': ['airport', 'car', 'konfigurator'], 'Amazon': ['apple', 'orange', 'butter']
**Domain** **URL** Keyword
Amazon amazon.com/xyz/butter butter
Amazon amazon.com/xyz/orange orange
Facebook facebook.com/male male
Google google.com/airport airport
Google goolge.com/car car
这只是一个虚构的数据。我有clickstream数据,我想在其中使用“域”和“URL”列。事实上,我有许多关键字的列表,我保存在字典里,我需要在url中搜索它,然后提取它来创建新的列
我有这样的字典:
**Domain** **URL**
Amazon amazon.com/xyz/butter
Amazon amazon.com/xyz/orange
Facebook facebook.com/male
Google google.com/airport
Google goolge.com/car
dict_keyword = {'Facebook': ['boy', 'girl', 'man'], 'Google': ['airport', 'car', 'konfigurator'], 'Amazon': ['apple', 'orange', 'butter']
**Domain** **URL** Keyword
Amazon amazon.com/xyz/butter butter
Amazon amazon.com/xyz/orange orange
Facebook facebook.com/male male
Google google.com/airport airport
Google goolge.com/car car
我希望获得如下输出:
**Domain** **URL**
Amazon amazon.com/xyz/butter
Amazon amazon.com/xyz/orange
Facebook facebook.com/male
Google google.com/airport
Google goolge.com/car
dict_keyword = {'Facebook': ['boy', 'girl', 'man'], 'Google': ['airport', 'car', 'konfigurator'], 'Amazon': ['apple', 'orange', 'butter']
**Domain** **URL** Keyword
Amazon amazon.com/xyz/butter butter
Amazon amazon.com/xyz/orange orange
Facebook facebook.com/male male
Google google.com/airport airport
Google goolge.com/car car
到目前为止,我只想用一行代码。我正在尝试使用
df['Keyword'] = df.apply(lambda x: any(substring in x.URL for substring in dict_config[x.Domain]) ,axis =1)
我只得到布尔值,但我想返回关键字。有什么帮助吗?的想法是在列表理解的末尾添加过滤,如果不匹配,还添加了
next
和iter
,以返回默认值:
f = lambda x: next(iter([sub for sub in dict_config[x.Domain] if sub in x.URL]), 'no match')
df['Keyword'] = df.apply(f, axis=1)
print (df)
Domain URL Keyword
0 Amazon amazon.com/xyz/butter butter
1 Amazon amazon.com/xyz/orange orange
2 Facebook facebook.com/male no match
3 Google google.com/airport airport
4 Google goolge.com/car car
如果可能,也不匹配第一个域
列解决方案将更改为。获取
以使用默认值进行查找:
print (df)
Domain URL
0 Amazon amazon.com/xyz/butter
1 Amazon amazon.com/xyz/orange
2 Facebook facebook.com/male
3 Google google.com/airport
4 Google1 goolge.com/car <- changed last value to Google1
dict_config = {'Facebook': ['boy', 'girl', 'man'],
'Google': ['airport', 'car', 'konfigurator'],
'Amazon': ['apple', 'orange', 'butter']}
f = lambda x: next(iter([sub for sub in dict_config.get(x.Domain, '')
if sub in x.URL]), 'no match')
df['Keyword'] = df.apply(f, axis=1)
Domain URL Keyword
0 Amazon amazon.com/xyz/butter butter
1 Amazon amazon.com/xyz/orange orange
2 Facebook facebook.com/male no match
3 Google google.com/airport airport
4 Google1 goolge.com/car no match
打印(df)
域URL
0 Amazon.com/xyz/butter
1 Amazon.com/xyz/orange
2 Facebook.com/male
3 Google.com/airport
4 Google1 goolge.com/car你永远是救命稻草。惊人@jezrael你能提供最好的资料来源,让我可以了解更多关于下一步,iter,lambda操作的信息吗?@MuhammadSalmanShahid-这是更纯粹的python方式,所以你可以检查一下。