R 在特定条件下映射两个数据帧_R

R 在特定条件下映射两个数据帧

R 在特定条件下映射两个数据帧,r,R,我已经问过了但现在我的问题有点不同，我无法使用这个解决方案，也无法解决。我想要数据集1中发生在数据集2之前的数据，这是我的数据： # Dataset 1 (dts1) UserID date Hour Events 1 5 25/07/2016 02:31 8 2 5 30/07/2016 02:42 6 3 4 23/07/2016 07:52 9

我已经问过了

但现在我的问题有点不同，我无法使用这个解决方案，也无法解决。我想要数据集1中发生在数据集2之前的数据，这是我的数据：

 # Dataset 1 (dts1)

     UserID   date   Hour     Events    
  1    5  25/07/2016  02:31      8         
  2    5  30/07/2016  02:42      6      
  3    4  23/07/2016  07:52      9         
  4   14  24/07/2016  03:02      5         
  5   17  25/07/2016  09:12      10        
  6    4  22/07/2016  03:22      4

及

因此，我希望比较数据集1中的数据集，只添加在数据集2之前发生的数据集。换句话说，我想确保我不计算在用户的最后一个事务之后发生的事件。理想输出如下所示：

    #output 

   UserID   Events      transaction 

    5         8         4,3
    4         9,4       2
   14         5         3
   17         10        NA

在上面的示例中，我确保删除了用户5的事件6，因为它发生在他的最后一个事务之后。

我们首先将时间转换为POSIX类

dts1$time <- strptime(paste(dts1$date, dts1$Hour), format="%d/%m/%Y %H:%M")
dts2$time <- strptime(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M")

最后，我们使用您描述的规则构建

Events

列

out$Events <- sapply(1:nrow(out), function(i) {
    User2 <- out$UserID[i]
    time2 <- out$time[i]
    rows <- which(dts1$UserID==User2 & dts1$time<time2)
    if (length(rows)>0) {            
        dts1$Events[rows]
    } else {
        NA
    }
})

请注意，由于用户17不在

dts2

中，因此它不会出现在

out

中，我们首先将时间转换为POSIX类

dts1$time <- strptime(paste(dts1$date, dts1$Hour), format="%d/%m/%Y %H:%M")
dts2$time <- strptime(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M")

最后，我们使用您描述的规则构建

Events

列

out$Events <- sapply(1:nrow(out), function(i) {
    User2 <- out$UserID[i]
    time2 <- out$time[i]
    rows <- which(dts1$UserID==User2 & dts1$time<time2)
    if (length(rows)>0) {            
        dts1$Events[rows]
    } else {
        NA
    }
})

请注意，由于用户17不在

dts2

，因此它不会出现在

out

中，这是对您上一个问题的@dimitris\p>答案的修改。如果他选择回答，我很乐意删除我的

此问题与上一个问题之间的主要区别在于，我们现在希望每个特定的

用户ID

的

dts1

事件在最后一次
dts2
事务之前发生。因此，我们希望首先对
dts1
事件时间小于最后一次
dts2
事务时间的行
按UserID 进行分组，然后对filter 进行筛选。然后，我们可以对唯一的事件和事务进行总结
，仍然按照
用户ID
进行分组

dts2 <- dts2[order(dts2$time, decreasing=TRUE), ] out <- do.call(rbind, by(dts2[,c("UserID","time")], dts2$UserID, head, 1)) out$transactions <- tapply(dts2$transactions, dts2$UserID, c)
代码是：

library(dplyr) ## I will not use the lubridate package, instead I will convert the time ## using as.POSIXct dts1$time <- as.POSIXct(paste(dts1$date, dts1$Hour), format="%d/%m/%Y %H:%M") dts2$time <- as.POSIXct(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M") # first join the two data.frames by UserID. result <- left_join(dts1, dts2, by="UserID") %>% # all subsequent processing is grouped by the UserID because we # want to compare the last transaction time to the Event times # for each UserID. group_by(UserID) %>% # apply the filtering condition dts1 Event must be before last dts2 transaction. # Note that we keep rows for which there is no row in # dts2 for a UserID in dts1. This is the case for UserID=17. filter(is.na(time.y) | last(time.y) > time.x) %>% # summarise Events and transactions summarise(Events = toString(unique(Events)), transactions = toString(unique(transactions)))

库（dplyr） ##我不会使用lubridate包，而是转换时间 ##使用as.POSIXct dts1$time.x）%>% #总结事件和交易摘要（事件=toString（唯一（事件）），事务=toString（唯一（事务）））
结果是：

print(result) ## A tibble: 4 x 3 ## UserID Events transactions ## <int> <chr> <chr> ##1 4 9, 4 2 ##2 5 8 4, 3 ##3 14 5 3 ##4 17 10 NA

打印（结果） ##一个tibble:4x3 ##用户标识事件事务 ## ##1 4 9, 4 2 ##2 5 8 4, 3 ##3 14 5 3 ##41710 NA

希望这能有所帮助。
这是对@dimitris\p对您上一个问题答案的修改。如果他选择回答，我很乐意删除我的
此问题与上一个问题之间的主要区别在于，我们现在希望每个特定的
用户ID
的
dts1
事件在最后一次
dts2
事务之前发生。因此，我们希望首先对
dts1
事件时间小于最后一次
dts2
事务时间的行
按UserID 进行分组，然后对filter 进行筛选。然后，我们可以对唯一的事件和事务进行总结
，仍然按照
用户ID
进行分组

dts2 <- dts2[order(dts2$time, decreasing=TRUE), ] out <- do.call(rbind, by(dts2[,c("UserID","time")], dts2$UserID, head, 1)) out$transactions <- tapply(dts2$transactions, dts2$UserID, c)
代码是：

library(dplyr) ## I will not use the lubridate package, instead I will convert the time ## using as.POSIXct dts1$time <- as.POSIXct(paste(dts1$date, dts1$Hour), format="%d/%m/%Y %H:%M") dts2$time <- as.POSIXct(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M") # first join the two data.frames by UserID. result <- left_join(dts1, dts2, by="UserID") %>% # all subsequent processing is grouped by the UserID because we # want to compare the last transaction time to the Event times # for each UserID. group_by(UserID) %>% # apply the filtering condition dts1 Event must be before last dts2 transaction. # Note that we keep rows for which there is no row in # dts2 for a UserID in dts1. This is the case for UserID=17. filter(is.na(time.y) | last(time.y) > time.x) %>% # summarise Events and transactions summarise(Events = toString(unique(Events)), transactions = toString(unique(transactions)))

库（dplyr） ##我不会使用lubridate包，而是转换时间 ##使用as.POSIXct dts1$time.x）%>% #总结事件和交易摘要（事件=toString（唯一（事件）），事务=toString（唯一（事务）））
结果是：

print(result) ## A tibble: 4 x 3 ## UserID Events transactions ## <int> <chr> <chr> ##1 4 9, 4 2 ##2 5 8 4, 3 ##3 14 5 3 ##4 17 10 NA

打印（结果） ##一个tibble:4x3 ##用户标识事件事务 ## ##1 4 9, 4 2 ##2 5 8 4, 3 ##3 14 5 3 ##41710 NA

希望这能有所帮助。
请你解释一下为什么否决这个问题？我最近在这里很活跃，不太熟悉被否决的人不是我，但通常情况下，展示你试图做的事情和你陷入困境的地方是一个很好的实践。你能解释一下你为什么否决这个问题吗？我最近在这里很活跃，不太熟悉被否决的人不是我，但通常情况下，展示你试图做的事情和你陷入困境的地方是一个很好的实践。你看，非常感谢，它帮了很多忙。非常感谢，它帮了很多忙