如何创建在特定条件下计算另一列的列?R

如何创建在特定条件下计算另一列的列?R,r,dplyr,R,Dplyr,下面,数据已被重新调整,并列出了输入和预期输出 数据 structure(list(record_id = c(110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,

下面,数据已被重新调整,并列出了输入和预期输出

数据

structure(list(record_id = c(110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101
), start = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59), stop = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60), `treatment (type)` = c(1, 
1, 1, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 3, 3, 0, 3, 3, 3, 
0, 2, 2, 2, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), n_interruption_periods = c(0, 
0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), n_interruption_periods_3days = c(0, 
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), n_interruption_days_3days = c(0, 
0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7)), row.names = c(NA, 
-60L), class = c("tbl_df", "tbl", "data.frame"))
解释

输入
开始
停止
是天数。每日治疗在治疗中列出,0=不治疗,这是一种中断,1:3是治疗A/B/C

输出 根据
治疗
列,我想每天计算:

  • n\u中断\u周期
    :中断周期的总和/数量,与中断的持续时间无关
  • n\u中断\u期间\u 3天
    :总和/中断次数,条件是仅当持续时间>=3天时才应计数。短于3天的中断不值得关注
  • n\u中断天数\u 3天
    :中断天数的累计总和/数量,其中中断仅从中断的第3天开始计算
  • 问题 我想创建一个脚本,根据
    treatment
    变量自动计算上述输出变量

    希望你能帮忙

    体重

    响应OP

    以下是说明问题的部分数据:

    structure(list(record_id = c(110001, 110002, 110002, 110002, 
    110001), day_count = c(732, 0, 1, 2, 733), day_count_stop = c(733, 
    1, 2, 3, 734), oac_class = c(0, 1, 1, 1, 1), n_interruption_periods = c(1, 
    1, 0, 0, 1), n_interruption_periods_3days = c(1, 1, 0, 0, 1)), row.names = c(NA, 
    -5L), groups = structure(list(record_id = c(110001, 110002), 
        .rows = structure(list(c(1L, 5L), 2:4), ptype = integer(0), class = c("vctrs_list_of", 
        "vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
    "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
    "tbl_df", "tbl", "data.frame"))
    
    对于建议的代码,有两个问题:

  • 我相信结果向量没有被分配到正确的位置。在这里您可以看到,110002个关于
    n个中断时间段
    n个中断时间段3天
    的第一个数据从110001个结果扩展而来

  • 当我尝试运行第三个向量时,收到以下错误: while(any(d!=0))中出错{:缺少需要TRUE/FALSE的值


  • BW

    编辑:完全删除所有内容并重新开始

    为了你的缘故,我真的希望有人能给出一个不那么混乱的答案,但是这些函数应该可以工作

    FindFirstVector = function(TreatmentVector){
      #Which entries are equal to 0
      id = which(TreatmentVector == 0)
      #IDs of first zeros occuring (First day w.o. treatment)
      id1 = id[c(0,diff(id)) != 1]
      #Create vector of zeroes
      temp = rep(0,length(TreatmentVector))
      #Add 1 for the first zero
      temp[id1] = 1
      #Take cumulative sum
      cumsum(temp)
    }
    
    
    FindSecondVector = function(TreatmentVector){
      #Which entries are equal to 0
      id = which(TreatmentVector == 0)
      #IDs of first zeros occuring (First day w.o. treatment)
      id1 = id[c(0,diff(id)) != 1]
      #IDs of last zeros (Last day w.o. treatment)
      id2 = id[c(diff(id),2) > 1]
      #Amount of days w.o. treatment is then:
      d = id2 - id1 + 1
      #id3 is then the starting id of period of no treatment, if the period is longer
      #than 2 days. Then 2 is added, so start counting from day 3 of the period.
      id3 = id1[id2 - id1 + 1 > 2] + 2
      temp = rep(0,length(TreatmentVector))
      temp[id3] = 1
      cumsum(temp)
    }
    
    
    # Building third vector ---------------------------------------------------
    FindThirdVector = function(TreatmentVector){
      #Which entries are equal to 0
      id = which(TreatmentVector == 0)
      #IDs of first zeros occuring (First day w.o. treatment)
      id1 = id[c(0,diff(id)) != 1]
      #IDs of last zeros (Last day w.o. treatment)
      id2 = id[c(diff(id),2) > 1]
      #Amount of days w.o. treatment is then:
      d = id2 - id1 + 1
      #id3 is then the starting id of period of no treatment, if the period is longer
      #than 2 days. Then 2 is added, so start counting from day 3 of the period.
      id3 = id1[id2 - id1 + 1 > 2] + 2
      #The id of the ending day of period w.o. treatment longer than 2 days.
      id4 = id2[id2 - id1 + 1 > 2]
      
      #d is the amount of days to add 1's
      d = id4-id3
      temp = rep(0,length(TreatmentVector))
      while(any(d!=0)){
        temp[id3 + d] = 1
        d = d - 1
        d[d<0] = 0
      }
      temp[id3 + d] = 1
      cumsum(temp)
    }
    
    下面是一个使用
    dplyr
    的较短(而且在我看来更干净)的解决方案。我不确定您在使用其他解决方案时会出现什么错误,但这可能对您更有效

    #按记录id分组
    数据=数据%>%分组依据(记录id)
    #定义辅助列
    count_streak=函数(v)累计(v,~if_else(.y,.x+1,0),.init=0)[-1]
    数据=数据%>%突变(中断\条纹=计数\条纹(`treatment(type)`==0))
    数据=数据%>%
    突变(n_中断_周期=累计(中断_条纹==1),
    n\u中断\u周期\u 3天=累计(中断\u条纹==3),
    n\u中断\u天\u 3天=累计(中断\u条纹>=3))
    
    我们定义了一个helper列
    interruption\u streak
    ,它与当前中断周期的每一天一起计数。因此,在每个中断周期的第一天,它是
    1
    ,依此类推

    由此,我们可以计算其他列:

    • n_interruption_periods
      只是中断期开始的累计天数
    • n\u中断\u期间\u 3天
      是中断期间第三天的累计计数
    • n_interruption_days_3days
      是中断期间第三天或更高天数的累计计数

    我希望这个解释是有意义的,否则你可以自由地问。

    < P>我想我们可以修改我上次写的函数来解决你所有的问题。考虑下面的函数。< /P>
    conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
      len <- length(x); out <- integer(len)
      p0 <- which(x == 0L)
      if (n > 1L)
        p0 <- Reduce(function(idx, i) {
          lidx <- idx - i + 1L
          idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
          idx[x[lidx] == 0L]
        }, seq_len(n)[-1L], p0)
      if (length(p0) < 1L)
        return(out)
      ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
      rl <- ub - p0 + 1L
      pfill <- pfill(p0)
      res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
      pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
      `[<-`(out, pos, res)
    }
    
    以n=1为例,上一个问题简化为

    conditional_count(x, 1L, function(p0) integer(length(p0)), seq_along, 30L)
    
    ifill + pfill                              :       1 2 3 4 ...  1 2 3 4 ...
    ifill is a sequence along the gap positions:       1 2 3 4 ...  1 2 3 4 ...
    pfill is always 0 at all positions of p0   :       0            0       
    p0 identifies                              :       v            v       
    x looks like                               :   1 2 0 ........   0       
    
    conditional_count(x, 1L, function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L), function(x) integer(length(x)), Inf)
    
    ifill + pfill                                  :       1 1 1 ...     2 2 ...
    ifill is always 0 along the gap positions      :       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
    pfill increases 1 at each starting streak of 0s:       1             2
    p0 identifies                                  :       v v v         v v
    x looks like                                   :   1 2 0 0 0 ....... 0 0 ...
    
    conditional_count(x, 1L, seq_along, function(x) integer(length(x)), Inf)
    
    ifill + pfill                            :       1 2 3 ...     4 5 ...
    ifill is always 0 along the gap positions:       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
    pfill increases 1 at each 0              :       1 2 3         4 5
    p0 identifies                            :       v v v         v v
    x looks like                             :   1 2 0 0 0 ....... 0 0 ...
    
    这个问题简化为

    conditional_count(x, 1L, function(p0) integer(length(p0)), seq_along, 30L)
    
    ifill + pfill                              :       1 2 3 4 ...  1 2 3 4 ...
    ifill is a sequence along the gap positions:       1 2 3 4 ...  1 2 3 4 ...
    pfill is always 0 at all positions of p0   :       0            0       
    p0 identifies                              :       v            v       
    x looks like                               :   1 2 0 ........   0       
    
    conditional_count(x, 1L, function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L), function(x) integer(length(x)), Inf)
    
    ifill + pfill                                  :       1 1 1 ...     2 2 ...
    ifill is always 0 along the gap positions      :       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
    pfill increases 1 at each starting streak of 0s:       1             2
    p0 identifies                                  :       v v v         v v
    x looks like                                   :   1 2 0 0 0 ....... 0 0 ...
    
    conditional_count(x, 1L, seq_along, function(x) integer(length(x)), Inf)
    
    ifill + pfill                            :       1 2 3 ...     4 5 ...
    ifill is always 0 along the gap positions:       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
    pfill increases 1 at each 0              :       1 2 3         4 5
    p0 identifies                            :       v v v         v v
    x looks like                             :   1 2 0 0 0 ....... 0 0 ...
    
    这个问题的完整脚本是

    conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
      len <- length(x); out <- integer(len)
      p0 <- which(x == 0L)
      if (n > 1L)
        p0 <- Reduce(function(idx, i) {
          lidx <- idx - i + 1L
          idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
          idx[x[lidx] == 0L]
        }, seq_len(n)[-1L], p0)
      if (length(p0) < 1L)
        return(out)
      ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
      rl <- ub - p0 + 1L
      pfill <- pfill(p0)
      res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
      pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
      `[<-`(out, pos, res)
    }
    
    count_streak <- function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L)
    integer_along <- function(x) integer(length(x))
    
    df %>%
      mutate(
        n_interruption_periods = conditional_count(`treatment (type)`, 1L, count_streak, integer_along, Inf),
        n_interruption_periods_3days = conditional_count(`treatment (type)`, 3L, count_streak, integer_along, Inf),
        n_interruption_days_3days = conditional_count(`treatment (type)`, 3L, seq_along, integer_along, Inf)
      )
    

    谢谢,为了测试这是否有效,我尝试了
    dat$test cumsum(diff(so\u interruption\u df$
    treatment(type)
    )哦,对了,对不起。我记得好像
    diff()
    总是以0开头。这是一个输出向量中项目之间差异的函数。如果
    处理中的第一个项目是0,这也会导致问题,因此,经过再三考虑,不建议使用此方法。@KBChu我已经更新了我的答案。但是它比我想象的要长得多。即使使用bas几乎可以肯定地缩短它。@KBChu我想不同的人是由他们的Id定义的?在这种情况下,类似于:
    n\u interruption\u periods=unname(unlist(unlist)(taply(dat$`treatment(type)`,dat$record\u Id,FindFirstVector)))
    应该可以工作。
    tapply
    在这种情况下将输出一个命名向量列表,每个Id对应一个。因此,
    未列出
    未命名
    。如果部分代码解释不足,请随时询问它们的作用。
       record_id start stop treatment (type) n_interruption_periods n_interruption_periods_3days n_interruption_days_3days
    1     110101     0    1                1                      0                            0                         0
    2     110101     1    2                1                      0                            0                         0
    3     110101     2    3                1                      0                            0                         0
    4     110101     3    4                0                      1                            0                         0
    5     110101     4    5                0                      1                            0                         0
    6     110101     5    6                0                      1                            1                         1
    7     110101     6    7                0                      1                            1                         2
    8     110101     7    8                2                      1                            1                         2
    9     110101     8    9                2                      1                            1                         2
    10    110101     9   10                2                      1                            1                         2
    11    110101    10   11                0                      2                            1                         2
    12    110101    11   12                0                      2                            1                         2
    13    110101    12   13                0                      2                            2                         3
    14    110101    13   14                0                      2                            2                         4
    15    110101    14   15                0                      2                            2                         5
    16    110101    15   16                0                      2                            2                         6
    17    110101    16   17                3                      2                            2                         6
    18    110101    17   18                3                      2                            2                         6
    19    110101    18   19                0                      3                            2                         6
    20    110101    19   20                3                      3                            2                         6
    21    110101    20   21                3                      3                            2                         6
    22    110101    21   22                3                      3                            2                         6
    23    110101    22   23                0                      4                            2                         6
    24    110101    23   24                2                      4                            2                         6
    25    110101    24   25                2                      4                            2                         6
    26    110101    25   26                2                      4                            2                         6
    27    110101    26   27                0                      5                            2                         6
    28    110101    27   28                0                      5                            2                         6
    29    110101    28   29                0                      5                            3                         7
    30    110101    29   30                1                      5                            3                         7
    31    110101    30   31                1                      5                            3                         7
    32    110101    31   32                1                      5                            3                         7
    33    110101    32   33                1                      5                            3                         7
    34    110101    33   34                1                      5                            3                         7
    35    110101    34   35                1                      5                            3                         7
    36    110101    35   36                1                      5                            3                         7
    37    110101    36   37                1                      5                            3                         7
    38    110101    37   38                1                      5                            3                         7
    39    110101    38   39                1                      5                            3                         7
    40    110101    39   40                1                      5                            3                         7
    41    110101    40   41                1                      5                            3                         7
    42    110101    41   42                1                      5                            3                         7
    43    110101    42   43                1                      5                            3                         7
    44    110101    43   44                1                      5                            3                         7
    45    110101    44   45                1                      5                            3                         7
    46    110101    45   46                1                      5                            3                         7
    47    110101    46   47                1                      5                            3                         7
    48    110101    47   48                1                      5                            3                         7
    49    110101    48   49                1                      5                            3                         7
    50    110101    49   50                1                      5                            3                         7
    51    110101    50   51                1                      5                            3                         7
    52    110101    51   52                1                      5                            3                         7
    53    110101    52   53                1                      5                            3                         7
    54    110101    53   54                1                      5                            3                         7
    55    110101    54   55                1                      5                            3                         7
    56    110101    55   56                1                      5                            3                         7
    57    110101    56   57                1                      5                            3                         7
    58    110101    57   58                1                      5                            3                         7
    59    110101    58   59                1                      5                            3                         7
    60    110101    59   60                1                      5                            3                         7