R-基于人员周期格式的条件填充值

R-基于人员周期格式的条件填充值,r,R,我正在努力寻找一种基于两个简单条件填充值的简单方法 我试图在每个工作日的第一个和最后一个“1”之后,用1填充变量working。这个例子更能说明问题 id hours dayweek working 1 1 1 Friday 0 2 1 2 Friday 0 3 1 3 Friday 0 4 1 4 Friday 0 5 1 5 Friday 0 6 1

我正在努力寻找一种基于两个简单条件填充值的简单方法

我试图在每个
工作日
的第一个和最后一个“
1
”之后,用
1
填充变量
working
。这个例子更能说明问题

    id hours dayweek working
1   1     1  Friday       0
2   1     2  Friday       0
3   1     3  Friday       0
4   1     4  Friday       0
5   1     5  Friday       0
6   1     6  Friday       0
7   1     7  Friday       0
8   1     8  Friday       1
9   1     9  Friday       0
10  1    10  Friday       0
11  1    11  Friday       0
12  1    12  Friday       0
13  1    13  Friday       0
14  1    14  Friday       0
15  1    15  Friday       0
16  1    16  Friday       0
17  1    17  Friday       1
18  1    18  Friday       0
19  1    19  Friday       0
20  1    20  Friday       0
我正在努力做到这一点

    id hours dayweek working
1   1     1  Friday       0
2   1     2  Friday       0
3   1     3  Friday       0
4   1     4  Friday       0
5   1     5  Friday       0
6   1     6  Friday       0
7   1     7  Friday       0
8   1     8  Friday       1
9   1     9  Friday       1
10  1    10  Friday       1
11  1    11  Friday       1
12  1    12  Friday       1
13  1    13  Friday       1
14  1    14  Friday       1
15  1    15  Friday       1
16  1    16  Friday       1
17  1    17  Friday       1
18  1    18  Friday       0
19  1    19  Friday       0
20  1    20  Friday       0
group\u by
必须是
id
dayweek

有线索吗

数据

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3"), class = "factor"), hours = 1:20, dayweek = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("Friday", "Monday", "Saturday", "Sunday", 
"Thursday", "Tuesday", "Wedesnday"), class = "factor"), working = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)), row.names = c(NA, 
20L), class = "data.frame", .Names = c("id", "hours", "dayweek", 
"working"))
同一问题的替代数据

dt = structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 29L, 30L, 
31L, 32L, 33L, 34L, 35L, 36L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 
64L), id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), hours = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), dayweek = structure(c(1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L), .Label = c("Friday", "Monday", "Saturday", 
"Sunday", "Thursday", "Tuesday", "Wedesnday"), class = "factor"), 
working = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 
0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame",   row.names = c(NA, 
-24L), .Names = c("X", "id", "hours", "dayweek", "working"))

我们可以使用
data.table
来实现这一点。我们将“data.frame”转换为“data.table”(
setDT(df1)
)。通过“id”和“dayweek”分组,我们得到了“working”中元素的数字索引,如果组中至少有1个值(
if(any)(working==1)),则该索引等于
if(
if(any)(working==1))
)。获取第一个(
头部(tmp,1)
)和最后一个(
尾部(tmp,1)
)位置之间的序列(
),并用
.I
将其包装以获取行索引('i1')。使用索引并将对应于该行的“工作”元素指定为1

library(data.table)
i1 <- setDT(df1)[, if(any(working==1)){tmp <- which(working==1)
                  .I[head(tmp,1):tail(tmp,1)]} , by = .(id, dayweek)]$V1

df1[i1, working:=1L]
df1
#    id hours dayweek working
# 1:  1     1  Friday       0
# 2:  1     2  Friday       0
# 3:  1     3  Friday       0
# 4:  1     4  Friday       0
# 5:  1     5  Friday       0
# 6:  1     6  Friday       0
# 7:  1     7  Friday       0
# 8:  1     8  Friday       1
# 9:  1     9  Friday       1
#10:  1    10  Friday       1
#11:  1    11  Friday       1
#12:  1    12  Friday       1
#13:  1    13  Friday       1
#14:  1    14  Friday       1
#15:  1    15  Friday       1
#16:  1    16  Friday       1
#17:  1    17  Friday       1
#18:  1    18  Friday       0
#19:  1    19  Friday       0
#20:  1    20  Friday       0
或者@Khashaa建议的一个紧凑选项,其中我们将“工作”的
cummax
与“工作”的反向(
rev
)的
cummax
相乘,以便只有
1
的元素在
向量中保持为1,而其他元素将被0替换

df1 %>% 
    group_by(id, dayweek) %>%
    mutate(working = cummax(working)*rev(cummax(rev(working))))

非常好-什么是
dplyr
等价物?试试
(cummax(x)>=max(x))*(rev(cummax(rev(x))>=max(x))
@jeremycg最后一个
1
变为
0
@Khashaa如果你没有将其作为解决方案发布,我可以将其添加到我的
解决方案列表中吗?@giacomoV如果
工作的
仅由0组成,
哪个(工作==1)
失败。你试过最后一种选择了吗?
df1 %>% 
    group_by(id, dayweek) %>%
    mutate(working = cummax(working)*rev(cummax(rev(working))))