在R中查找第一个治疗日期_R_Date

在R中查找第一个治疗日期

r date

在R中查找第一个治疗日期,r,date,R,Date,我有一些带有id、年份和变量的面板数据，该变量指示个人是否在该时间点接受治疗： id year treated 1 2000 0 1 2001 0 1 2002 1 1 2003 1 1 2004 1 我需要创建一个虚拟对象来表示治疗首次发生的年份。所需的输出类似于： id year tre

我有一些带有id、年份和变量的面板数据，该变量指示个人是否在该时间点接受治疗：

id  year   treated  
1   2000      0            
1   2001      0            
1   2002      1            
1   2003      1            
1   2004      1

我需要创建一个虚拟对象来表示治疗首次发生的年份。所需的输出类似于：

id  year   treated   treatment_year
1   2000      0            0
1   2001      0            0
1   2002      1            1
1   2003      1            0
1   2004      1            0

这对我来说似乎相当简单，但我已经被困了一段时间，我无法获得任何排序函数来完成这项工作。非常感谢您的帮助

您可以使用

匹配

来获取每个

id

中前1的索引，除了将所有内容替换为0之外

这可以通过使用dplyr来完成：

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(treatment_year = replace(treated, -match(1L, treated), 0L))
  #Can also use : 
  #mutate(treatment_year = +(row_number() == match(1L, treated)))

#     id  year treated treatment_year
#  <int> <int>   <int>          <int>
#1     1  2000       0              0
#2     1  2001       0              0
#3     1  2002       1              1
#4     1  2003       1              0
#5     1  2004       1              0

解释它是如何工作的

match

返回匹配的第一个索引。考虑这个例子

x <- c(0, 0, 1, 1, 1)
match(1, x)
#[1] 3

如果

总是有1/0的值，并且

总是至少有一个1，那么我们也可以使用

which.max

而不是

match

which.max(x)
#[1] 3

您可以使用

match

获取每个

id

中第一个1的索引，除此之外，将所有内容替换为0

这可以通过使用dplyr来完成：

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(treatment_year = replace(treated, -match(1L, treated), 0L))
  #Can also use : 
  #mutate(treatment_year = +(row_number() == match(1L, treated)))

#     id  year treated treatment_year
#  <int> <int>   <int>          <int>
#1     1  2000       0              0
#2     1  2001       0              0
#3     1  2002       1              1
#4     1  2003       1              0
#5     1  2004       1              0

解释它是如何工作的

match

返回匹配的第一个索引。考虑这个例子

x <- c(0, 0, 1, 1, 1)
match(1, x)
#[1] 3

如果

总是有1/0的值，并且

总是至少有一个1，那么我们也可以使用

which.max

而不是

match

which.max(x)
#[1] 3

我们可以用

行数

和

which.max

创建一个逻辑索引，并将其强制为二进制

library(dplyr)
df1 %>% 
   group_by(id) %>% 
   mutate(treatment_year = +(row_number() == which.max(treated)))
# A tibble: 5 x 4
# Groups:   id [1]
#     id  year treated treatment_year
#  <int> <int>   <int>          <int>
#1     1  2000       0              0
#2     1  2001       0              0
#3     1  2002       1              1
#4     1  2003       1              0
#5     1  2004       1              0

数据

df1我们可以用行数
和which.max
创建一个逻辑索引，并将其强制为二进制
library(dplyr)
df1 %>% 
   group_by(id) %>% 
   mutate(treatment_year = +(row_number() == which.max(treated)))
# A tibble: 5 x 4
# Groups:   id [1]
#     id  year treated treatment_year
#  <int> <int>   <int>          <int>
#1     1  2000       0              0
#2     1  2001       0              0
#3     1  2002       1              1
#4     1  2003       1              0
#5     1  2004       1              0

数据
df1这正是我要找的，非常感谢你能解释-匹配部分吗？我在答案中添加了一些解释，希望能有所帮助。这正是我要找的，非常感谢你能解释-匹配部分吗？我在答案中添加了一些解释，希望能有所帮助。
df1 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L), year = 2000:2004, 
    treated = c(0L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))