R 计算一段时间内事件的累积和_R_Time_Time Series

R 计算一段时间内事件的累积和

r time

R 计算一段时间内事件的累积和,r,time,time-series,R,Time,Time Series,我有一个数据框df，其中包含具有唯一身份号码的个人接种日期的数据。如果一个人在两岁之前接受了三次疫苗接种，则他被视为接种了疫苗。我的目标是计算完全接种疫苗的个人的累计总数，最终目标是绘制在任何给定时间x三岁以下未充分接种疫苗的人口比例。在我看来，我已经想出了完美的代码，但由于某种原因，我的直觉失败了，在时间段结束时，我得到了一个奇怪的增长。见下文在大量的数据争论之后，我们使用dataframedf开始示例数据，其中每一行都是一个单一的疫苗接种事件，每一列dataframedate包含感兴趣的时

我有一个数据框

df

，其中包含具有唯一身份号码的个人接种日期的数据。如果一个人在两岁之前接受了三次疫苗接种，则他被视为接种了疫苗。我的目标是计算完全接种疫苗的个人的累计总数，最终目标是绘制在任何给定时间

三岁以下未充分接种疫苗的人口比例。在我看来，我已经想出了完美的代码，但由于某种原因，我的直觉失败了，在时间段结束时，我得到了一个奇怪的增长。见下文

在大量的数据争论之后，我们使用dataframe

df

开始示例数据，其中每一行都是一个单一的疫苗接种事件，每一列dataframe

date

包含感兴趣的时间段中的每一个日期

glimpse(df)
Observations: 50,469
Variables: 6
$ id          <chr> "1000038", "1000038", "1000038", "1000128", "1000380",... 
$ n_max       <int> 3, 1, 1, 3, 3, 3, 3,...  ###total num times before 2 years old
$ age_y       <int> 0, 0, 0, 0, 1, 0, 0,... ###current age for this observation
$ age_m       <int> 3, 5, 11, 3, 4,... ###current age in months for this obs
$ date_vacc   <date> 2013-05-08, 2013-07-03, 2014-01-13,... ###current date obs
$ year        <dbl> 2013, 2013, 2014, 2013,... ###current year of obs

glimpse(date)
Observations: 4,017
Variables: 1
$ date_vacc <date> 2005-01-01, 2005-01-02, 2005-01-03, 2005-01-04, 2005-01-05, 2005-01-06, 2005-01-07, 2005-01-08, 2005-01-09, 20...

最后，该代码应将过去两年中接受最终剂量的人数相加，并作为滚动总和，近似于任何给定日期的人口中完全接种疫苗的两岁儿童人数

。因为它在一个固定的时间间隔内求和，所以我认为它应该进入一个稳定状态，即“退出”的人数是多少

lag_vacc <- 2 * 365.25
df$lagsum <- rep(NA, nrow(df))

for (i in (dim(df)[1] - (dim(df)[1] - lag_vacc)):dim(df)[1]) {
    df$lagsum[i] <-
        sum(df$nsum[(i - lag_vacc):i])
}

正如预测的那样，它达到了稳定状态。然后，Put开始再次增加，最终达到人口的1.3%，即接种疫苗的人数超过现有人数。这不再具有任何实际重要性，甚至不再是表示这些数据的愚蠢方式。但我不知道我的推理哪里是错误的。为什么这样不行？有更好的方法吗

编辑：经过每天几个小时的几天，我想我终于明白了这一点。综上所述，上述代码根据三剂疫苗的“最后”剂量的日期计算了一段时间内接种的个体的滚动累积总和。将“最后一次”剂量相加（根据情况代表第二次或第三次剂量）是可取的，因为即使没有第三次和最后一次剂量，两次剂量也能为生命的前4-5年提供良好的保护。由于在x轴末端（2015年12月31日）有一个分界点，因此在该点之后本应接受第三次也是“最后一次”剂量的个体，由于第二次剂量被确定为“最后一次”，因此过早地输入了累积总和

EDIT2：下面的代码生成总体分母，以生成与上面的图像非常相似的图像-但将y轴转换为比例而不是计数

df_pop <-
pop %>%
mutate(year = as.integer(year)) %>%
filter(grepl("all", pop$gender), age_y >= 1, age_y <= 2, year >= 2005) %>%
select(age_y, year, at_risk) %>%
group_by(year) %>%
summarise(n_atrisk = sum(at_risk))

df <-
df %>%
mutate(year = year(date_vacc)) %>%
left_join(df_pop) %>%
mutate(prop = lagsum / n_atrisk)

ggplot(df,
   aes(x = date_vacc, y = prop)) +
geom_line() +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
scale_y_continuous(breaks = pretty(df$prop, n = 10)) +
theme_bw()

好的，所以你不期望一个稳定的值，而是期望一个围绕某个渐近线的“振荡”，对吗？
在你的代码中有一件事我觉得有点奇怪。这一行：

for (i in (dim(df)[1] - (dim(df)[1] - lag_vacc)):dim(df)[1])

，如果我们做数学运算，删除括号的结果似乎是：

for (i in (lag_vacc:dim(df)[1])

这对我来说似乎不正确。难道不应该简单地说：

for (i in ((dim(df)[1] - lag_vacc):dim(df)[1])

也许我错了，但那可能是罪魁祸首

也可以考虑使用<代码> ROLLAPION<代码>，在移动窗口上进行累积和。好的。让我们再试一次。在评论中讨论的基础上，我将试图找出一个答案

据我所知，对于您的分析，您只需要：

包含给定人口ID和出生日期的数据框

包含给定id获得疫苗注射日期的数据框

由于您无法提供数据，我将创建一个合成数据：

  library(ggplot2)
  library(dplyr)
  library(data.table)
  # build regular date array
  date <- data_frame(date_vacc = as.Date(as.Date("2005-01-01"):as.Date("2015-12-31"),
                                         origin = as.Date("1970-01-01")))
  # build a fake population of 5000 people born between 2005 and 2015
  n_people = 5000
  birth_date <- sample(date$date_vacc, n_people, replace = TRUE, set.seed(1))
  ids = as.factor(as.character(1:n_people))
  mypop = data.table(id = ids, birth = birth_date, key = "id") %>%  arrange(birth)
  qplot(mypop$birth, binwidth = 60, geom = "bar" )+theme_bw()

在这里，NAs对应于从未接种过疫苗的人：2009年之前出生的人，或因其他原因未接种疫苗的人。现在，让我们试着回答您最初的问题：在任何日期，2岁以下的受试者中有多少人接受了最后一次（第三次）接种疫苗

percs = list() for (d in 1:length(date$date_vacc)){ dd <- date$date_vacc[d] #Now establish our population of interest: people below 2 years old at date dd pop_sub <- dftot %>% filter(birth < dd) %>% #Remove not yet born filter(birth > (dd - 365.25*2)) # Remove older than 2 years # number of subjects to consider n_sub = length(unique(pop_sub$id)) # Now Find subsample with 3 shots perc <- pop_sub %>% filter(date_vacc <= dd |is.na(date_vacc)) %>% # remove all vaccinations made after current date analyzed group_by(id) %>% # gropu by id and find the last vaccination shot (1,2,3) summarise(lastvacc = max(n_vacc)) %>% filter(lastvacc == 3) # Get only people with 3 shots # number of "fully vaccinated" n_vacc = length(perc$id) percs[[d]] = data.frame(date = dd, perc = n_vacc/n_sub) } percs_df = rbindlist(percs) ggplot(percs_df, aes(x = date, y = perc)) + geom_line(aes(group = 1))+ scale_x_date(date_breaks = "18 months") + theme_bw()

percs=list（）适用于（d/1：长度（日期$date\U vacc））{ dd%#移除尚未出生过滤器（出生>（dd-365.25*2））#删除超过2年的考虑对象的数量 n_sub=长度（唯一（pop_sub$id）） #现在找到有3个镜头的子样本 perc% 筛选（日期_vacc%#删除当前分析日期之后进行的所有疫苗接种按id分组%>%#按id分组并找到最后一次疫苗注射（1,2,3）总结（lastvacc=max（n_vacc））%>% 过滤器（lastvacc==3）#仅获取具有3个镜头的人 #“完全接种”的人数 n_vacc=长度（perc$id） percs[[d]]=data.frame（日期=dd，perc=n\u vacc/n\u sub） } percs_df=rbindlist（percs） ggplot（percs_df，aes（x=日期，y=perc））+geom_线（aes（组=1））+ 比例x日期（日期=18个月）+主题bw（）

起初，我认为分析是错误的。然而，想得好一点，这是显而易见的：因为我假设孩子们在一岁左右打了第三枪，看看两岁以下打了三枪的人的百分比不可避免地，我会看到大约50%，因为一半的孩子还不到一岁因此没有打第三枪
然而，根据你的评论，我认为事实上你有兴趣回答一个完全不同的问题，那就是：在任何时候，2岁以下的受试者中有多大比例没有风险？这似乎也是一个更“有趣”的问题
为了尝试回答这个问题，我认为你需要做一些假设。特别是，定义不同的“注射”提供免疫的时间。这里我真的输入了随机数，但根据你的评论，我假设第一次注射可以提供4个月的免疫，第二次和第三次注射可以提供3年的免疫。（因此，如果一个孩子打到第二枪，如果他没有打到第三枪就不算了）。一种可能性是：

percsimm = list() duration_1st <- 130 # first shot immunizes for 4 months duration_2nd <- 365.25*3 # second shot immunizes for 3 years duration_3rd <- 365.25*3 for (d in 1:length(date$date_vacc)){ dd <- date$date_vacc[d] # establish our population of interest: people below 2 years old at date dd pop_sub <- dftot %>% filter(birth < dd) %>% #Remove unborn kids filter(birth > (dd - 365.25*2)) # Remove kids older than 2 years n_sub = length(unique(pop_sub$id)) perc <- pop_sub %>% filter(date_vacc <= dd |is.na(date_vacc)) %>% # remove all vaccinations made after current date analyzed group_by(id) %>% mutate(lastvacc = last(n_vacc)) %>% # find the last vaccination for the subject filter(row_number() %in% c(n())) %>% # extract it from the df mutate(timetolast = as.numeric(dd - date_vacc)) %>% # how much time elapsed since last shot ? mutate(immune = ifelse((lastvacc == 1 & timetolast <= duration_1st) | # Is subject still immune ? (lastvacc == 2 & timetolast <= duration_2nd) | (lastvacc == 3 & timetolast <= duration_3rd), 1, 0)) %>% filter(immune == 1) # Get only people with 3 shots n_immune= length(perc$id) percsimm[[d]] = data.frame(date = dd, perc = n_immune/n_sub) } percsimm_df = rbindlist(percsimm) ggplot(percsimm_df, aes(x = date, y = perc)) + geom_line(aes(group = 1)) + scale_x_date(date_breaks = "18 months") + theme_bw()

percsimm=list（）持续时间_1st%#找到受试者的最后一次疫苗接种过滤器（行号（）%在%c（n（））%%>%#从df中提取变异（timetolast=as.numeric（dd- library(ggplot2) library(dplyr) library(data.table) # build regular date array date <- data_frame(date_vacc = as.Date(as.Date("2005-01-01"):as.Date("2015-12-31"), origin = as.Date("1970-01-01"))) # build a fake population of 5000 people born between 2005 and 2015 n_people = 5000 birth_date <- sample(date$date_vacc, n_people, replace = TRUE, set.seed(1)) ids = as.factor(as.character(1:n_people)) mypop = data.table(id = ids, birth = birth_date, key = "id") %>% arrange(birth) qplot(mypop$birth, binwidth = 60, geom = "bar" )+theme_bw() # build fake vaccinations dataset listout = list() for (p in seq(along = mypop$id)) { indiv = mypop[p,] # take one subject vaccs = c(indiv$birth + sample(seq(90,120),1), # first vaxx at 3 months indiv$birth + sample(seq(180,210),1), # secodn at 6 months indiv$birth + sample(seq(365,395),1)) # third at one year vaccs = vaccs[vaccs >= "2009-01-01"] # assume first vaccinations started in 2009 if (length(vaccs) > 0 ){ data = data.frame(id = as.character(indiv$id), birth = indiv$birth, date_vacc = vaccs, n_vacc = 1:length(vaccs)) listout[[p]] = data } } df = rbindlist(listout) df$id = as.factor(df$id) # Here I randomly remove some vaccinations: assume that only 95% of childs are usually vaccinated ! vacc = sample(mypop$id, 0.95*length(mypop$id)) df = subset(df, id %in% vacc) # Join the "population" data frame with the "vaccinations" one dftot = full_join(mypop, df) %>% arrange(birth,date_vacc,id) summary(dftot) id birth date_vacc n_vacc Length:11364 Min. :2005-01-01 Min. :2009-01-01 Min. :1.000 Class :character 1st Qu.:2009-06-01 1st Qu.:2010-10-22 1st Qu.:1.000 Mode :character Median :2011-07-31 Median :2012-11-06 Median :2.000 Mean :2011-06-30 Mean :2012-10-30 Mean :1.969 3rd Qu.:2013-12-13 3rd Qu.:2014-10-30 3rd Qu.:3.000 Max. :2015-12-31 Max. :2017-01-27 Max. :3.000 NA's :1565 NA's :1565 percs = list() for (d in 1:length(date$date_vacc)){ dd <- date$date_vacc[d] #Now establish our population of interest: people below 2 years old at date dd pop_sub <- dftot %>% filter(birth < dd) %>% #Remove not yet born filter(birth > (dd - 365.25*2)) # Remove older than 2 years # number of subjects to consider n_sub = length(unique(pop_sub$id)) # Now Find subsample with 3 shots perc <- pop_sub %>% filter(date_vacc <= dd |is.na(date_vacc)) %>% # remove all vaccinations made after current date analyzed group_by(id) %>% # gropu by id and find the last vaccination shot (1,2,3) summarise(lastvacc = max(n_vacc)) %>% filter(lastvacc == 3) # Get only people with 3 shots # number of "fully vaccinated" n_vacc = length(perc$id) percs[[d]] = data.frame(date = dd, perc = n_vacc/n_sub) } percs_df = rbindlist(percs) ggplot(percs_df, aes(x = date, y = perc)) + geom_line(aes(group = 1))+ scale_x_date(date_breaks = "18 months") + theme_bw() percsimm = list() duration_1st <- 130 # first shot immunizes for 4 months duration_2nd <- 365.25*3 # second shot immunizes for 3 years duration_3rd <- 365.25*3 for (d in 1:length(date$date_vacc)){ dd <- date$date_vacc[d] # establish our population of interest: people below 2 years old at date dd pop_sub <- dftot %>% filter(birth < dd) %>% #Remove unborn kids filter(birth > (dd - 365.25*2)) # Remove kids older than 2 years n_sub = length(unique(pop_sub$id)) perc <- pop_sub %>% filter(date_vacc <= dd |is.na(date_vacc)) %>% # remove all vaccinations made after current date analyzed group_by(id) %>% mutate(lastvacc = last(n_vacc)) %>% # find the last vaccination for the subject filter(row_number() %in% c(n())) %>% # extract it from the df mutate(timetolast = as.numeric(dd - date_vacc)) %>% # how much time elapsed since last shot ? mutate(immune = ifelse((lastvacc == 1 & timetolast <= duration_1st) | # Is subject still immune ? (lastvacc == 2 & timetolast <= duration_2nd) | (lastvacc == 3 & timetolast <= duration_3rd), 1, 0)) %>% filter(immune == 1) # Get only people with 3 shots n_immune= length(perc$id) percsimm[[d]] = data.frame(date = dd, perc = n_immune/n_sub) } percsimm_df = rbindlist(percsimm) ggplot(percsimm_df, aes(x = date, y = perc)) + geom_line(aes(group = 1)) + scale_x_date(date_breaks = "18 months") + theme_bw()