如何创建在特定条件下计算另一列的列？R_R_Dplyr

如何创建在特定条件下计算另一列的列？R

如何创建在特定条件下计算另一列的列？R,r,dplyr,R,Dplyr,下面，数据已被重新调整，并列出了输入和预期输出数据 structure(list(record_id = c(110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,

下面，数据已被重新调整，并列出了输入和预期输出

数据

structure(list(record_id = c(110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101
), start = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59), stop = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60), `treatment (type)` = c(1, 
1, 1, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 3, 3, 0, 3, 3, 3, 
0, 2, 2, 2, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), n_interruption_periods = c(0, 
0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), n_interruption_periods_3days = c(0, 
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), n_interruption_days_3days = c(0, 
0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7)), row.names = c(NA, 
-60L), class = c("tbl_df", "tbl", "data.frame"))

解释

输入

开始

和

停止

是天数。每日治疗在治疗中列出，0=不治疗，这是一种中断，1:3是治疗A/B/C

输出根据

治疗

列，我想每天计算：

n\u中断\u周期

：中断周期的总和/数量，与中断的持续时间无关

n\u中断\u期间\u 3天

：总和/中断次数，条件是仅当持续时间>=3天时才应计数。短于3天的中断不值得关注

n\u中断天数\u 3天

：中断天数的累计总和/数量，其中中断仅从中断的第3天开始计算

问题我想创建一个脚本，根据

treatment

变量自动计算上述输出变量

希望你能帮忙

体重

响应OP

以下是说明问题的部分数据：

structure(list(record_id = c(110001, 110002, 110002, 110002, 
110001), day_count = c(732, 0, 1, 2, 733), day_count_stop = c(733, 
1, 2, 3, 734), oac_class = c(0, 1, 1, 1, 1), n_interruption_periods = c(1, 
1, 0, 0, 1), n_interruption_periods_3days = c(1, 1, 0, 0, 1)), row.names = c(NA, 
-5L), groups = structure(list(record_id = c(110001, 110002), 
    .rows = structure(list(c(1L, 5L), 2:4), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

对于建议的代码，有两个问题：

我相信结果向量没有被分配到正确的位置。在这里您可以看到，110002个关于

n个中断时间段

和

n个中断时间段3天

的第一个数据从110001个结果扩展而来

当我尝试运行第三个向量时，收到以下错误： while（any（d！=0））中出错{：缺少需要TRUE/FALSE的值

编辑：完全删除所有内容并重新开始

为了你的缘故，我真的希望有人能给出一个不那么混乱的答案，但是这些函数应该可以工作

FindFirstVector = function(TreatmentVector){
  #Which entries are equal to 0
  id = which(TreatmentVector == 0)
  #IDs of first zeros occuring (First day w.o. treatment)
  id1 = id[c(0,diff(id)) != 1]
  #Create vector of zeroes
  temp = rep(0,length(TreatmentVector))
  #Add 1 for the first zero
  temp[id1] = 1
  #Take cumulative sum
  cumsum(temp)
}


FindSecondVector = function(TreatmentVector){
  #Which entries are equal to 0
  id = which(TreatmentVector == 0)
  #IDs of first zeros occuring (First day w.o. treatment)
  id1 = id[c(0,diff(id)) != 1]
  #IDs of last zeros (Last day w.o. treatment)
  id2 = id[c(diff(id),2) > 1]
  #Amount of days w.o. treatment is then:
  d = id2 - id1 + 1
  #id3 is then the starting id of period of no treatment, if the period is longer
  #than 2 days. Then 2 is added, so start counting from day 3 of the period.
  id3 = id1[id2 - id1 + 1 > 2] + 2
  temp = rep(0,length(TreatmentVector))
  temp[id3] = 1
  cumsum(temp)
}


# Building third vector ---------------------------------------------------
FindThirdVector = function(TreatmentVector){
  #Which entries are equal to 0
  id = which(TreatmentVector == 0)
  #IDs of first zeros occuring (First day w.o. treatment)
  id1 = id[c(0,diff(id)) != 1]
  #IDs of last zeros (Last day w.o. treatment)
  id2 = id[c(diff(id),2) > 1]
  #Amount of days w.o. treatment is then:
  d = id2 - id1 + 1
  #id3 is then the starting id of period of no treatment, if the period is longer
  #than 2 days. Then 2 is added, so start counting from day 3 of the period.
  id3 = id1[id2 - id1 + 1 > 2] + 2
  #The id of the ending day of period w.o. treatment longer than 2 days.
  id4 = id2[id2 - id1 + 1 > 2]
  
  #d is the amount of days to add 1's
  d = id4-id3
  temp = rep(0,length(TreatmentVector))
  while(any(d!=0)){
    temp[id3 + d] = 1
    d = d - 1
    d[d<0] = 0
  }
  temp[id3 + d] = 1
  cumsum(temp)
}

下面是一个使用

dplyr

的较短（而且在我看来更干净）的解决方案。我不确定您在使用其他解决方案时会出现什么错误，但这可能对您更有效

#按记录id分组
数据=数据%>%分组依据（记录id）
#定义辅助列
count_streak=函数（v）累计（v，~if_else（.y，.x+1，0），.init=0）[-1]
数据=数据%>%突变（中断\条纹=计数\条纹（`treatment（type）`==0））
数据=数据%>%
突变（n_中断_周期=累计（中断_条纹==1），
n\u中断\u周期\u 3天=累计（中断\u条纹==3），
n\u中断\u天\u 3天=累计（中断\u条纹>=3））

我们定义了一个helper列

interruption\u streak

，它与当前中断周期的每一天一起计数。因此，在每个中断周期的第一天，它是

，依此类推

由此，我们可以计算其他列：

```
n_interruption_periods
```
只是中断期开始的累计天数
```
n\u中断\u期间\u 3天
```
是中断期间第三天的累计计数
```
n_interruption_days_3days
```
是中断期间第三天或更高天数的累计计数

我希望这个解释是有意义的，否则你可以自由地问。

< P>我想我们可以修改我上次写的函数来解决你所有的问题。考虑下面的函数。< /P>

conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
  len <- length(x); out <- integer(len)
  p0 <- which(x == 0L)
  if (n > 1L)
    p0 <- Reduce(function(idx, i) {
      lidx <- idx - i + 1L
      idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
      idx[x[lidx] == 0L]
    }, seq_len(n)[-1L], p0)
  if (length(p0) < 1L)
    return(out)
  ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
  rl <- ub - p0 + 1L
  pfill <- pfill(p0)
  res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
  pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
  `[<-`(out, pos, res)
}

以n=1为例，上一个问题简化为

conditional_count(x, 1L, function(p0) integer(length(p0)), seq_along, 30L)

ifill + pfill                              :       1 2 3 4 ...  1 2 3 4 ...
ifill is a sequence along the gap positions:       1 2 3 4 ...  1 2 3 4 ...
pfill is always 0 at all positions of p0   :       0            0       
p0 identifies                              :       v            v       
x looks like                               :   1 2 0 ........   0

conditional_count(x, 1L, function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L), function(x) integer(length(x)), Inf)

ifill + pfill                                  :       1 1 1 ...     2 2 ...
ifill is always 0 along the gap positions      :       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each starting streak of 0s:       1             2
p0 identifies                                  :       v v v         v v
x looks like                                   :   1 2 0 0 0 ....... 0 0 ...

conditional_count(x, 1L, seq_along, function(x) integer(length(x)), Inf)

ifill + pfill                            :       1 2 3 ...     4 5 ...
ifill is always 0 along the gap positions:       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each 0              :       1 2 3         4 5
p0 identifies                            :       v v v         v v
x looks like                             :   1 2 0 0 0 ....... 0 0 ...

这个问题简化为

conditional_count(x, 1L, function(p0) integer(length(p0)), seq_along, 30L)

ifill + pfill                              :       1 2 3 4 ...  1 2 3 4 ...
ifill is a sequence along the gap positions:       1 2 3 4 ...  1 2 3 4 ...
pfill is always 0 at all positions of p0   :       0            0       
p0 identifies                              :       v            v       
x looks like                               :   1 2 0 ........   0

conditional_count(x, 1L, function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L), function(x) integer(length(x)), Inf)

ifill + pfill                                  :       1 1 1 ...     2 2 ...
ifill is always 0 along the gap positions      :       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each starting streak of 0s:       1             2
p0 identifies                                  :       v v v         v v
x looks like                                   :   1 2 0 0 0 ....... 0 0 ...

conditional_count(x, 1L, seq_along, function(x) integer(length(x)), Inf)

ifill + pfill                            :       1 2 3 ...     4 5 ...
ifill is always 0 along the gap positions:       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each 0              :       1 2 3         4 5
p0 identifies                            :       v v v         v v
x looks like                             :   1 2 0 0 0 ....... 0 0 ...

这个问题的完整脚本是

conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
  len <- length(x); out <- integer(len)
  p0 <- which(x == 0L)
  if (n > 1L)
    p0 <- Reduce(function(idx, i) {
      lidx <- idx - i + 1L
      idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
      idx[x[lidx] == 0L]
    }, seq_len(n)[-1L], p0)
  if (length(p0) < 1L)
    return(out)
  ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
  rl <- ub - p0 + 1L
  pfill <- pfill(p0)
  res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
  pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
  `[<-`(out, pos, res)
}

count_streak <- function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L)
integer_along <- function(x) integer(length(x))

df %>%
  mutate(
    n_interruption_periods = conditional_count(`treatment (type)`, 1L, count_streak, integer_along, Inf),
    n_interruption_periods_3days = conditional_count(`treatment (type)`, 3L, count_streak, integer_along, Inf),
    n_interruption_days_3days = conditional_count(`treatment (type)`, 3L, seq_along, integer_along, Inf)
  )

谢谢，为了测试这是否有效，我尝试了

dat$test cumsum（diff（so\u interruption\u df$treatment（type）
）哦，对了，对不起。我记得好像diff（）
总是以0开头。这是一个输出向量中项目之间差异的函数。如果处理中的第一个项目是0，这也会导致问题，因此，经过再三考虑，不建议使用此方法。@KBChu我已经更新了我的答案。但是它比我想象的要长得多。即使使用bas几乎可以肯定地缩短它。@KBChu我想不同的人是由他们的Id定义的？在这种情况下，类似于：n\u interruption\u periods=unname（unlist（unlist）（taply（dat$`treatment（type）`，dat$record\u Id，FindFirstVector）））
应该可以工作。tapply
在这种情况下将输出一个命名向量列表，每个Id对应一个。因此，未列出
和未命名。如果部分代码解释不足，请随时询问它们的作用。
   record_id start stop treatment (type) n_interruption_periods n_interruption_periods_3days n_interruption_days_3days
1     110101     0    1                1                      0                            0                         0
2     110101     1    2                1                      0                            0                         0
3     110101     2    3                1                      0                            0                         0
4     110101     3    4                0                      1                            0                         0
5     110101     4    5                0                      1                            0                         0
6     110101     5    6                0                      1                            1                         1
7     110101     6    7                0                      1                            1                         2
8     110101     7    8                2                      1                            1                         2
9     110101     8    9                2                      1                            1                         2
10    110101     9   10                2                      1                            1                         2
11    110101    10   11                0                      2                            1                         2
12    110101    11   12                0                      2                            1                         2
13    110101    12   13                0                      2                            2                         3
14    110101    13   14                0                      2                            2                         4
15    110101    14   15                0                      2                            2                         5
16    110101    15   16                0                      2                            2                         6
17    110101    16   17                3                      2                            2                         6
18    110101    17   18                3                      2                            2                         6
19    110101    18   19                0                      3                            2                         6
20    110101    19   20                3                      3                            2                         6
21    110101    20   21                3                      3                            2                         6
22    110101    21   22                3                      3                            2                         6
23    110101    22   23                0                      4                            2                         6
24    110101    23   24                2                      4                            2                         6
25    110101    24   25                2                      4                            2                         6
26    110101    25   26                2                      4                            2                         6
27    110101    26   27                0                      5                            2                         6
28    110101    27   28                0                      5                            2                         6
29    110101    28   29                0                      5                            3                         7
30    110101    29   30                1                      5                            3                         7
31    110101    30   31                1                      5                            3                         7
32    110101    31   32                1                      5                            3                         7
33    110101    32   33                1                      5                            3                         7
34    110101    33   34                1                      5                            3                         7
35    110101    34   35                1                      5                            3                         7
36    110101    35   36                1                      5                            3                         7
37    110101    36   37                1                      5                            3                         7
38    110101    37   38                1                      5                            3                         7
39    110101    38   39                1                      5                            3                         7
40    110101    39   40                1                      5                            3                         7
41    110101    40   41                1                      5                            3                         7
42    110101    41   42                1                      5                            3                         7
43    110101    42   43                1                      5                            3                         7
44    110101    43   44                1                      5                            3                         7
45    110101    44   45                1                      5                            3                         7
46    110101    45   46                1                      5                            3                         7
47    110101    46   47                1                      5                            3                         7
48    110101    47   48                1                      5                            3                         7
49    110101    48   49                1                      5                            3                         7
50    110101    49   50                1                      5                            3                         7
51    110101    50   51                1                      5                            3                         7
52    110101    51   52                1                      5                            3                         7
53    110101    52   53                1                      5                            3                         7
54    110101    53   54                1                      5                            3                         7
55    110101    54   55                1                      5                            3                         7
56    110101    55   56                1                      5                            3                         7
57    110101    56   57                1                      5                            3                         7
58    110101    57   58                1                      5                            3                         7
59    110101    58   59                1                      5                            3                         7
60    110101    59   60                1                      5                            3                         7