R-基于人员周期格式的条件填充值
我正在努力寻找一种基于两个简单条件填充值的简单方法 我试图在每个R-基于人员周期格式的条件填充值,r,R,我正在努力寻找一种基于两个简单条件填充值的简单方法 我试图在每个工作日的第一个和最后一个“1”之后,用1填充变量working。这个例子更能说明问题 id hours dayweek working 1 1 1 Friday 0 2 1 2 Friday 0 3 1 3 Friday 0 4 1 4 Friday 0 5 1 5 Friday 0 6 1
工作日的第一个和最后一个“1
”之后,用1
填充变量working
。这个例子更能说明问题
id hours dayweek working
1 1 1 Friday 0
2 1 2 Friday 0
3 1 3 Friday 0
4 1 4 Friday 0
5 1 5 Friday 0
6 1 6 Friday 0
7 1 7 Friday 0
8 1 8 Friday 1
9 1 9 Friday 0
10 1 10 Friday 0
11 1 11 Friday 0
12 1 12 Friday 0
13 1 13 Friday 0
14 1 14 Friday 0
15 1 15 Friday 0
16 1 16 Friday 0
17 1 17 Friday 1
18 1 18 Friday 0
19 1 19 Friday 0
20 1 20 Friday 0
我正在努力做到这一点
id hours dayweek working
1 1 1 Friday 0
2 1 2 Friday 0
3 1 3 Friday 0
4 1 4 Friday 0
5 1 5 Friday 0
6 1 6 Friday 0
7 1 7 Friday 0
8 1 8 Friday 1
9 1 9 Friday 1
10 1 10 Friday 1
11 1 11 Friday 1
12 1 12 Friday 1
13 1 13 Friday 1
14 1 14 Friday 1
15 1 15 Friday 1
16 1 16 Friday 1
17 1 17 Friday 1
18 1 18 Friday 0
19 1 19 Friday 0
20 1 20 Friday 0
group\u by
必须是id
和dayweek
有线索吗
数据
structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "3"), class = "factor"), hours = 1:20, dayweek = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("Friday", "Monday", "Saturday", "Sunday",
"Thursday", "Tuesday", "Wedesnday"), class = "factor"), working = c(0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)), row.names = c(NA,
20L), class = "data.frame", .Names = c("id", "hours", "dayweek",
"working"))
同一问题的替代数据
dt = structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 29L, 30L,
31L, 32L, 33L, 34L, 35L, 36L, 57L, 58L, 59L, 60L, 61L, 62L, 63L,
64L), id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), hours = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), dayweek = structure(c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("Friday", "Monday", "Saturday",
"Sunday", "Thursday", "Tuesday", "Wedesnday"), class = "factor"),
working = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-24L), .Names = c("X", "id", "hours", "dayweek", "working"))
我们可以使用data.table
来实现这一点。我们将“data.frame”转换为“data.table”(setDT(df1)
)。通过“id”和“dayweek”分组,我们得到了“working”中元素的数字索引,如果组中至少有1个值(if(any)(working==1)),则该索引等于if(if(any)(working==1))
)。获取第一个(头部(tmp,1)
)和最后一个(尾部(tmp,1)
)位置之间的序列(:
),并用.I
将其包装以获取行索引('i1')。使用索引并将对应于该行的“工作”元素指定为1
library(data.table)
i1 <- setDT(df1)[, if(any(working==1)){tmp <- which(working==1)
.I[head(tmp,1):tail(tmp,1)]} , by = .(id, dayweek)]$V1
df1[i1, working:=1L]
df1
# id hours dayweek working
# 1: 1 1 Friday 0
# 2: 1 2 Friday 0
# 3: 1 3 Friday 0
# 4: 1 4 Friday 0
# 5: 1 5 Friday 0
# 6: 1 6 Friday 0
# 7: 1 7 Friday 0
# 8: 1 8 Friday 1
# 9: 1 9 Friday 1
#10: 1 10 Friday 1
#11: 1 11 Friday 1
#12: 1 12 Friday 1
#13: 1 13 Friday 1
#14: 1 14 Friday 1
#15: 1 15 Friday 1
#16: 1 16 Friday 1
#17: 1 17 Friday 1
#18: 1 18 Friday 0
#19: 1 19 Friday 0
#20: 1 20 Friday 0
或者@Khashaa建议的一个紧凑选项,其中我们将“工作”的cummax
与“工作”的反向(rev
)的cummax
相乘,以便只有1
的元素在向量中保持为1,而其他元素将被0替换
df1 %>%
group_by(id, dayweek) %>%
mutate(working = cummax(working)*rev(cummax(rev(working))))
非常好-什么是dplyr
等价物?试试(cummax(x)>=max(x))*(rev(cummax(rev(x))>=max(x))
@jeremycg最后一个1
变为0
@Khashaa如果你没有将其作为解决方案发布,我可以将其添加到我的解决方案列表中吗?@giacomoV如果工作的仅由0组成,哪个(工作==1)
失败。你试过最后一种选择了吗?
df1 %>%
group_by(id, dayweek) %>%
mutate(working = cummax(working)*rev(cummax(rev(working))))