如何在不包含行的特定日期的情况下按ID执行条件聚合&；R中的特定日期范围（日期减去天数）_R

如何在不包含行的特定日期的情况下按ID执行条件聚合&；R中的特定日期范围（日期减去天数）

如何在不包含行的特定日期的情况下按ID执行条件聚合&；R中的特定日期范围（日期减去天数）,r,R,如何按ID执行条件聚合，而不包括最大日期和特定日期范围的条件，例如日期减去某些日期问题A.1、A.2和B：输入数据=o2i ID date event_p event_b 1 8/7/2016 1 0 1 8/1/2016 1 0 1 8/1/2016 1 1 2 7/28/2016 1 0 2 8/7/2016 1 1 2 7/29/2016 1 1 3 7/10/2016 1 0 3

如何按ID执行条件聚合，而不包括最大日期和特定日期范围的条件，例如日期减去某些日期

问题A.1、A.2和B：

输入数据=o2i

ID  date    event_p event_b
1   8/7/2016    1   0
1   8/1/2016    1   0
1   8/1/2016    1   1
2   7/28/2016   1   0
2   8/7/2016    1   1
2   7/29/2016   1   1
3   7/10/2016   1   0
3   7/7/2016    1   1
3   7/14/2016   1   1
4   8/24/2016   1   1
4   8/26/2016   1   1

解决方案A.1）我想将总数限制为仅对上述日期前7天内发生的事件进行合计在日期列中（每个用户ID）。注：不包括日期（但可追溯至日期前7天）。
注：如果（日期-7天）的记录不存在，则逻辑仍然相同

A.1-预期产出：

ID  date    event_p
1   8/7/2016    2
1   8/1/2016    0
2   7/28/2016   0
2   8/7/2016    0
2   7/29/2016   1
3   7/10/2016   1
3   7/7/2016    0
3   7/14/2016   2
4   8/24/2016   0
4   8/26/2016   1

ID  date    event_p event_b
1   8/7/2016    2   1
1   8/1/2016    0   0
2   7/28/2016   0   0
2   8/7/2016    0   0
2   7/29/2016   1   0
3   7/10/2016   1   1
3   7/7/2016    0   0
3   7/14/2016   2   1
4   8/24/2016   0   1
4   8/26/2016   1   1

注：此处2016年8月1日输入文件中有两行（在同一日期），但在输出文件中显示为一行。这是首选，但如果显示两行，也可以

解决方案A.2）是否有一种方法可以编写代码，使事件p和事件b按照相同的逻辑聚合，而不是按事件p（过去7天）和事件b分别求和

A.2-预期产出：

ID  date    event_p
1   8/7/2016    2
1   8/1/2016    0
2   7/28/2016   0
2   8/7/2016    0
2   7/29/2016   1
3   7/10/2016   1
3   7/7/2016    0
3   7/14/2016   2
4   8/24/2016   0
4   8/26/2016   1

ID  date    event_p event_b
1   8/7/2016    2   1
1   8/1/2016    0   0
2   7/28/2016   0   0
2   8/7/2016    0   0
2   7/29/2016   1   0
3   7/10/2016   1   1
3   7/7/2016    0   0
3   7/14/2016   2   1
4   8/24/2016   0   1
4   8/26/2016   1   1

注：此处2016年8月1日输入文件中有两行（在同一日期），但在输出文件中显示为一行。这是首选，但如果显示两行，也可以

解决方案B）我想计算在日期列中提到的日期之前发生的事件总数（每个用户ID）。注：不包括日期（但返回至日期前的“所有”天）注：如果（日期-7天）的记录不存在，则逻辑仍然相同

我尝试的是：我已经研究并查看了这个网站，并尝试写了将近一周的代码，我能得到的最接近的是通过这个代码

我的尝试A.1：

    # convert factor to POSIXlt
o2i$date <- as.POSIXlt(o2i$date, format="%m/%d/%Y")
class(o2i$date)
o2i$date
o2i

# convert factor to date
o2i$date <- as.Date(o2i$date)
class(o2i$date)

# Aggregation Option 1
cum7_event_p <- aggregate(event_p~ID+date, subset(o2i, date < max(o2i$date) & date >= (o2i$date)-7),sum)
cum7_event_p


# Aggregation Option 2
cum7_event_p <- aggregate(event_p~ID+date, subset(o2i, date < max((o2i$date)-1) & date >= (o2i$date)-7),sum)
cum7_event_p

注意：它也在计算特定日期的事件。。例如，在2016年8月1日，它显示了2的总和。但根据逻辑，它应该显示计数为“0”，因为它是7天的计数（在该日期之前-不包括该日期）…在2016年8月7日，它应该显示计数为2

我的尝试A.2：

    ## All Event Aggregation ##
cum7 <- aggregate(o2i[,3:4], o2i[, c(1,2)], data=subset(o2i, date < max(o2i$date) & date >= (o2i$date)-7), sum)
# Error: Error in FUN(X[[i]], ...) : invalid 'type' (list) of argument

cum7 <- aggregate(o2i[,3:4], o2i[, c(1,2)], sum)  # Does not include the Logic of "Calling the Date (every date - per ID) and calling it a Max Date, while counting)
cum7

注意：我不知道如何编写最好的代码来合并（不包括特定日期-以及该日期之前7天的总和..和/或..该日期之前的所有日期）

我希望我清楚地解释了我的问题和预期产出。如果有人写了一个函数来求解，我将非常感谢你能再写几行解释它是如何工作的。

我找不到一个优雅的方法来解决这个问题。我只需要创建一个窗口函数，它将日期和事件列放在一起，并输出所需的结果。通过执行group by，可以使用dplyr为每个ID应用函数

library(lubridate)
library(dplyr)

myfun <- function(dates,events){
  ct <- rep(0,length(dates))
  for (i in 1:length(dates)){
    ct[i] <- sum(events[between(dates,dates[i]-7,dates[i]-1)])
  }
  return(ct)
}

dt <- read.table('testdata',head=T)
output <- dt %>%
  mutate(date = as.Date(parse_date_time(date,c('mdy')))) %>%
  group_by(ID) %>%
  mutate(summary_event_p = myfun(date,event_p), summary_event_b = myfun(date,event_b)) %>%
  ungroup() %>%
  distinct(ID,date,summary_event_p,summary_event_b)

# # A tibble: 10 × 4
#      ID       date summary_event_p summary_event_b
#    <int>     <date>           <dbl>           <dbl>
# 1      1 2016-08-07               2               1
# 2      1 2016-08-01               0               0
# 3      2 2016-07-28               0               0
# 4      2 2016-08-07               0               0
# 5      2 2016-07-29               1               0
# 6      3 2016-07-10               1               1
# 7      3 2016-07-07               0               0
# 8      3 2016-07-14               2               1
# 9      4 2016-08-24               0               0
# 10     4 2016-08-26               1               1

库（lubridate）
图书馆（dplyr）
我的乐趣%
不同（ID、日期、摘要事件、摘要事件）
##A tible:10×4
#ID日期摘要事件摘要事件
#                               
# 1      1 2016-08-07               2               1
# 2      1 2016-08-01               0               0
# 3      2 2016-07-28               0               0
# 4      2 2016-08-07               0               0
# 5      2 2016-07-29               1               0
# 6      3 2016-07-10               1               1
# 7      3 2016-07-07               0               0
# 8      3 2016-07-14               2               1
# 9      4 2016-08-24               0               0
# 10     4 2016-08-26               1               1

testdata文件只是复制数据内容的文本文件。我使用了lubridate函数来清理原始数据中的日期格式

谢谢你，朱利叶斯。。深思熟虑，回答准确。这个解决方案非常有效。我不会按ID和日期将结果（输出）与初始数据帧连接起来。最有可能是左外连接。我的意思是，我现在将通过ID和日期将结果（输出）与初始数据帧连接起来。我希望这个问题能够帮助所有想要计算任何事件发生次数的ppl（在特定日期之前，而不是在该日期）。。很可能是左外连接。