Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 单侧模糊连接匹配_R_Date_Tidyverse_Fuzzyjoin - Fatal编程技术网

R 单侧模糊连接匹配

R 单侧模糊连接匹配,r,date,tidyverse,fuzzyjoin,R,Date,Tidyverse,Fuzzyjoin,我试图从一个表格(myChickWts)中收集体重值,该表格是在另一个表格(chickblood)中记录的每个血液样本之前一周收集的。我想得到一份血液样本前一周的血液日期和相关重量的清单。我尝试了几种不同的方法,并且我不断地在结果中包含的血样日期之后获取日期 在本例中,匹配返回的日期既在血缘日期之前(1/9、1/11、1/13),也在血缘日期之后(1/15)。这两张桌子怎么配?我也尝试了difference_join,但它在我的另一个结果之前7天和之后7天返回了结果——同样,这不是我想要的 Ch

我试图从一个表格(myChickWts)中收集体重值,该表格是在另一个表格(chickblood)中记录的每个血液样本之前一周收集的。我想得到一份血液样本前一周的血液日期和相关重量的清单。我尝试了几种不同的方法,并且我不断地在结果中包含的血样日期之后获取日期

在本例中,匹配返回的日期既在血缘日期之前(1/9、1/11、1/13),也在血缘日期之后(1/15)。这两张桌子怎么配?我也尝试了difference_join,但它在我的另一个结果之前7天和之后7天返回了结果——同样,这不是我想要的

Chick   Date.x (blood)  Date.y (weight)  Chick.y  Weight.y
10     2019-01-14       2019-01-09       10       74
10     2019-01-14       2019-01-11       10       81
10     2019-01-14       2019-01-13       10       89
10     2019-01-14       2019-01-15       10       96




library(tidyverse)
library(lubridate)
library(fuzzyjoin)
导入数据(reprex的示例数据)

我也尝试了difference_join,但在这种情况下,我似乎不知道如何让它与chick匹配,它在约会前后都返回

   chickblood %>%
    difference_join(mychickwts, by = "Date",
       max_dist = 7
      )
我尝试过使用lubridate中的%但没有成功。这会返回一个错误,我不确定确切的原因

chickblood %>%
fuzzy_left_join(
  mychickwts,
  by = c("Chick" = "Chick", 
         "Date" = "Date"),
  match_fun = list("==", "%within%")
  ) %>%
  arrange(Date.x)

Error in which(m) : argument to 'which' is not logical

由于数据集不太大,您可以在“Chick”上进行正常的左连接,然后确定体重日期是否在血液工作日期前一周。从那里你可以保留你想要的行

library(tidyverse)
library(lubridate)
library(fuzzyjoin)

mychickwts$Chick <- as.numeric(mychickwts$Chick)

chickblood %>% 
  left_join(mychickwts, by = "Chick", suffix = c(".blood", ".wt")) %>% 
  mutate(wt_days_prior = Date.blood - Date.wt) %>% 
  mutate(wt_in_week_prior = wt_days_prior <= 7 & wt_days_prior >= 0) %>% 
  filter(wt_in_week_prior)
库(tidyverse)
图书馆(lubridate)
库(模糊连接)
mychickwts$Chick%
左连接(mychickwts,by=“Chick”,后缀=c(“.blood”,“.wt”))%>%
突变(wt_天之前=Date.blood-Date.wt)%>%
变异(前一周内=前几天=0)%>%
过滤器(前一周内的重量)
或者,如果您想在单个联接中执行此操作,类似的操作可能会奏效

chickblood %>% 
  fuzzy_left_join(mychickwts, by = c("Chick", "Date"),
                  match_fun = list(`==`, function(x, y) x - y >= 0 & x - y <= 7)
  )
鸡血%>%
fuzzy_left_join(mychickwts,by=c(“Chick”,“Date”),

match_fun=list(`=`,function(x,y)x-y>=0&x-y好吧,我的真实数据集有大约800万条记录,但你的两种方法都有效!我将对我的数据进行子集,并在沿着一条或另一条路径前进之前进行一些速度测试。谢谢!
chickblood %>%
fuzzy_left_join(
  mychickwts,
  by = c("Chick" = "Chick", 
         "Date" = "Date"),
  match_fun = list("==", "%within%")
  ) %>%
  arrange(Date.x)

Error in which(m) : argument to 'which' is not logical
library(tidyverse)
library(lubridate)
library(fuzzyjoin)

mychickwts$Chick <- as.numeric(mychickwts$Chick)

chickblood %>% 
  left_join(mychickwts, by = "Chick", suffix = c(".blood", ".wt")) %>% 
  mutate(wt_days_prior = Date.blood - Date.wt) %>% 
  mutate(wt_in_week_prior = wt_days_prior <= 7 & wt_days_prior >= 0) %>% 
  filter(wt_in_week_prior)
chickblood %>% 
  fuzzy_left_join(mychickwts, by = c("Chick", "Date"),
                  match_fun = list(`==`, function(x, y) x - y >= 0 & x - y <= 7)
  )