R fuzzy_left_加入比赛乐趣%in%
一些数据R fuzzy_left_加入比赛乐趣%in%,r,fuzzyjoin,R,Fuzzyjoin,一些数据 example_df <- data.frame( url = c('blog/blah', 'blog/?utm_medium=foo', 'blah', 'subscription/apples', 'UK/something'), numbs = 1:5 ) lookup_df <- data.frame( string = c('blog', 'subscription', 'UK'), group = c('blog', 'subs', 'UK'
example_df <- data.frame(
url = c('blog/blah', 'blog/?utm_medium=foo', 'blah', 'subscription/apples', 'UK/something'),
numbs = 1:5
)
lookup_df <- data.frame(
string = c('blog', 'subscription', 'UK'),
group = c('blog', 'subs', 'UK')
)
library(fuzzyjoin)
data_combined <- example_df %>%
fuzzy_left_join(lookup_df, by = c("url" = "string"),
match_fun = `%in%`)
data_combined
url numbs string group
1 blog/blah 1 <NA> <NA>
2 blog/?utm_medium=foo 2 <NA> <NA>
3 blah 3 <NA> <NA>
4 subscription/apples 4 <NA> <NA>
5 UK/something 5 <NA> <NA>
example\u df如果我们想将“url”中/
前面的单词与“lookup\u df”中的“string”列进行部分匹配,我们可以将该子字符串提取为一个新列,然后执行regex\u left\u join
library(dplyr)
library(fuzzyjoin)
library(stringr)
example_df %>%
mutate(string = str_remove(url, "\\/.*")) %>%
regex_left_join(lookup_df, by = 'string') %>%
select(url, numbs, group)
-输出
# url numbs group
#1 blog/blah 1 blog
#2 blog/?utm_medium=foo 2 blog
#3 blah 3 <NA>
#4 subscription/apples 4 subs
#5 UK/something 5 UK
#url numbs组
#1个博客/废话1个博客
#2 blog/?utm_medium=foo 2 blog
#三言两语
#4份订阅/苹果4份订阅
#英国
我想你可能需要%like%
而不是%
中的%like%
这是否总是在/
之前寻找匹配,然后示例[u df%>%mutate(string=stru-remove(url,\\/.])%>%left\u-join(lookup\u-df)
这是你的第一条评论,就像我在fuzzy\u-join(x,y,by,match\u-fun,mode=“left”…):找不到对象“%like%”抱歉,它来自data.table。我认为如果匹配的模式在复制的/
之前,那么第二条注释应该有效。我注意到了。你能试试解决方案postedPerfect中的代码吗?这很有效,而且似乎不会重复!为什么它不会复制?好吧,不是吗complaining@DougFir这将是%like%
匹配的方式。另外,最好是更直接地进行匹配,而不是整个url