R 按组基于上一行和下一行的值生成新变量
我使用多个受试者(R 按组基于上一行和下一行的值生成新变量,r,R,我使用多个受试者(id)的面板数据,并且有一个事件(first\u occurrence)发生在不同的日子。我的目标是创建一个新变量(result),该变量在第一次出现的前2天、第一次出现的前2天以及第一次出现的后2天为1 以下是一个示例,其中包括样本数据和所需输出: data <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3), day = c(0, 1, 2,
id
)的面板数据,并且有一个事件(first\u occurrence
)发生在不同的日子。我的目标是创建一个新变量(result
),该变量在第一次出现的前2天、第一次出现的前2天以及第一次出现的后2天为1
以下是一个示例,其中包括样本数据和所需输出:
data <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 3, 3, 3, 3, 3, 3, 3), day = c(0, 1, 2, 3, 4, 5, 6, 7, 0, 1,
2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6), first_occurrence = c(0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1), desired_output = c(1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1)), .Names = c("id",
"day", "first_occurrence", "desired_output"), row.names = c(NA,
-21L), class = "data.frame")
这里有一条路。您可以使用ave
按组查找,然后使用which.max
查找第一个匹配项,然后计算所有其他值与该值的距离
close<-(with(data, ave(first_occurrence, id, FUN=function(x)
abs(seq_along(x)-which.max(x)))
)<=2)+0
这就给了
id day first_occurrence desired_output close
1 1 0 0 1 1
2 1 1 0 1 1
3 1 2 1 1 1
4 1 3 0 1 1
5 1 4 0 1 1
6 1 5 0 0 0
7 1 6 0 0 0
8 1 7 0 0 0
9 2 0 0 0 0
10 2 1 0 0 0
11 2 2 0 1 1
12 2 3 0 1 1
13 2 4 1 1 1
14 2 5 0 1 1
15 3 0 0 0 0
16 3 1 0 0 0
17 3 2 0 0 0
18 3 3 0 0 0
19 3 4 0 1 1
20 3 5 0 1 1
21 3 6 1 1 1
如所愿。请注意,此方法假定数据是按天排序的。下面是另一种使用包dplyr
的方法:
require(dplyr) #install and load the package
data %.%
arrange(id, day) %.% # to sort the data by id and day. If it is already, you can remove this row
group_by(id) %.%
mutate(n = 1:n(),
result = ifelse(abs(n - n[first_occurrence == 1]) <= 2, 1, 0)) %.%
select(-n)
# id day first_occurrence desired_output result
#1 1 0 0 1 1
#2 1 1 0 1 1
#3 1 2 1 1 1
#4 1 3 0 1 1
#5 1 4 0 1 1
#6 1 5 0 0 0
#7 1 6 0 0 0
#8 1 7 0 0 0
#9 2 0 0 0 0
#10 2 1 0 0 0
#11 2 2 0 1 1
#12 2 3 0 1 1
#13 2 4 1 1 1
#14 2 5 0 1 1
#15 3 0 0 0 0
#16 3 1 0 0 0
#17 3 2 0 0 0
#18 3 3 0 0 0
#19 3 4 0 1 1
#20 3 5 0 1 1
#21 3 6 1 1 1
非常有趣的方法——感谢对新dplyr包的全面描述和简洁的使用。
id day first_occurrence desired_output close
1 1 0 0 1 1
2 1 1 0 1 1
3 1 2 1 1 1
4 1 3 0 1 1
5 1 4 0 1 1
6 1 5 0 0 0
7 1 6 0 0 0
8 1 7 0 0 0
9 2 0 0 0 0
10 2 1 0 0 0
11 2 2 0 1 1
12 2 3 0 1 1
13 2 4 1 1 1
14 2 5 0 1 1
15 3 0 0 0 0
16 3 1 0 0 0
17 3 2 0 0 0
18 3 3 0 0 0
19 3 4 0 1 1
20 3 5 0 1 1
21 3 6 1 1 1
require(dplyr) #install and load the package
data %.%
arrange(id, day) %.% # to sort the data by id and day. If it is already, you can remove this row
group_by(id) %.%
mutate(n = 1:n(),
result = ifelse(abs(n - n[first_occurrence == 1]) <= 2, 1, 0)) %.%
select(-n)
# id day first_occurrence desired_output result
#1 1 0 0 1 1
#2 1 1 0 1 1
#3 1 2 1 1 1
#4 1 3 0 1 1
#5 1 4 0 1 1
#6 1 5 0 0 0
#7 1 6 0 0 0
#8 1 7 0 0 0
#9 2 0 0 0 0
#10 2 1 0 0 0
#11 2 2 0 1 1
#12 2 3 0 1 1
#13 2 4 1 1 1
#14 2 5 0 1 1
#15 3 0 0 0 0
#16 3 1 0 0 0
#17 3 2 0 0 0
#18 3 3 0 0 0
#19 3 4 0 1 1
#20 3 5 0 1 1
#21 3 6 1 1 1
data %.%
arrange(id, day) %.% # to sort the data by id and day. If it is already, you can remove this row
mutate(n = 1:n()) %.%
group_by(id) %.%
mutate(result = ifelse(abs(n - n[first_occurrence == 1]) <= 2, 1, 0)) %.%
select(-n)