R 通过重叠匹配值和间隔
现在我有两个数据帧:R 通过重叠匹配值和间隔,r,join,dplyr,data.table,R,Join,Dplyr,Data.table,现在我有两个数据帧: amount = c(19,21,39,45,62,71,100,121,130,160,180,210,240) id = rep(1,length(amount)) test <- data.frame(id,amount) interval = c(1:10) bottom = c(0,25,50,75,100,125,150,175,200,225) top = c(24,49,74,99,124,149,174,199,224,NA) test_2 &l
amount = c(19,21,39,45,62,71,100,121,130,160,180,210,240)
id = rep(1,length(amount))
test <- data.frame(id,amount)
interval = c(1:10)
bottom = c(0,25,50,75,100,125,150,175,200,225)
top = c(24,49,74,99,124,149,174,199,224,NA)
test_2 <- data.frame(interval,bottom,top)
但是,这将返回一条错误消息,即
Invalid numeric value for 'by.x'; it should be a vector with values 1 <= by.x <= length(x)
对于“by.x”无效的数值;它应该是一个值为1的向量这是否有效:
library(dplyr)
library(tidyr)
test_2 %>% mutate(top = replace_na(top, 250)) %>% rowwise() %>%
mutate(iv = list(seq(bottom, top, by = 1))) %>%
unnest(iv) %>% right_join(test, by = c('iv' = 'amount'), keep = T) %>%
select(id, interval, amount)
# A tibble: 13 x 3
id interval amount
<dbl> <int> <dbl>
1 1 1 19
2 1 1 21
3 1 2 39
4 1 2 45
5 1 3 62
6 1 3 71
7 1 5 100
8 1 5 121
9 1 6 130
10 1 7 160
11 1 8 180
12 1 9 210
13 1 10 240
库(dplyr)
图书馆(tidyr)
测试2%>%变异(顶部=替换a(顶部,250))%>%rowwise()%>%
变异(iv=列表(序号(底部、顶部、by=1))%>%
unnest(iv)%%>%right_加入(测试,通过=c('iv'='amount'),保持=T)%%>%
选择(id、间隔、金额)
#一个tibble:13x3
id间隔量
1 1 1 19
2 1 1 21
3 1 2 39
4 1 2 45
5 1 3 62
6 1 3 71
7 1 5 100
8 1 5 121
9 1 6 130
10 1 7 160
11 1 8 180
12 1 9 210
13 1 10 240
您可以尝试fuzzyjoin
:
fuzzyjoin::fuzzy_left_join(test, test_2,
by = c('amount' = 'top', 'amount' = 'bottom'),
match_fun = list(`<=`, `>=`))
# id amount interval bottom top
#1 1 19 1 0 24
#2 1 21 1 0 24
#3 1 39 2 25 49
#4 1 45 2 25 49
#5 1 62 3 50 74
#6 1 71 3 50 74
#7 1 100 5 100 124
#8 1 121 5 100 124
#9 1 130 6 125 149
#10 1 160 7 150 174
#11 1 180 8 175 199
#12 1 210 9 200 224
#13 1 240 NA NA NA
fuzzyjoin::fuzzy_left_join(测试,测试2,
by=c(‘金额’=‘顶部’、‘金额’=‘底部’),
匹配乐趣=列表(`=`))
#id金额间隔底部顶部
#1 1 19 1 0 24
#2 1 21 1 0 24
#3 1 39 2 25 49
#4 1 45 2 25 49
#5 1 62 3 50 74
#6 1 71 3 50 74
#7 1 100 5 100 124
#8 1 121 5 100 124
#9 1 130 6 125 149
#10 1 160 7 150 174
#11 1 180 8 175 199
#12 1 210 9 200 224
#131240纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳
Hey Karthik,所以我需要的是表“test”中的一个新列。因此,在最后,表格给了我间隔的id、数量和编号。@MaxS,对不起,你能解释一下间隔的编号,比如你说的行号吗?是的,当然,所以每个间隔都有一个id,test_2
中的列被称为ìintervall
。这就是我最后在表test
中需要的列,但连接取决于金额是否在顶部和底部值内。@MaxS,很抱歉,已经做了更改,请查看是否有效。因此,这对我的实际数据不起作用,我试图找出原因。你能想出一种使用foverlaps
的方法吗?
fuzzyjoin::fuzzy_left_join(test, test_2,
by = c('amount' = 'top', 'amount' = 'bottom'),
match_fun = list(`<=`, `>=`))
# id amount interval bottom top
#1 1 19 1 0 24
#2 1 21 1 0 24
#3 1 39 2 25 49
#4 1 45 2 25 49
#5 1 62 3 50 74
#6 1 71 3 50 74
#7 1 100 5 100 124
#8 1 121 5 100 124
#9 1 130 6 125 149
#10 1 160 7 150 174
#11 1 180 8 175 199
#12 1 210 9 200 224
#13 1 240 NA NA NA