R-根据涉及其他列的条件对列值进行变异'；组_R_Group By_Dplyr_Mutate

R-根据涉及其他列的条件对列值进行变异'；组

R-根据涉及其他列的条件对列值进行变异'；组,r,group-by,dplyr,mutate,R,Group By,Dplyr,Mutate,我有一个数据表中的四列df，我想以第五列为基础。当前的四列是-年、月、id和冲突。现在，conflict列只有1和0，对于给定的id分组，一年中出现1后，该年其余月份将出现1。我想将冲突列变为一个新列冲突_mutated如下：如果我们在给定的年份中，任何月份都包含1，而上一年的任何月份都包含1，我希望今年的月份都是冲突_mutated的1，同时保留所有旧的1 因此，如果我们有如下数据： year month id conflict 1989 6 33 0 1989 7 33 0

我有一个数据表中的四列

df

，我想以第五列为基础。当前的四列是-

年

、

月

、

id

和

冲突

。现在，

conflict

列只有1和0，对于给定的id分组，一年中出现1后，该年其余月份将出现1。我想将

冲突

列变为一个新列

冲突_mutated

如下：如果我们在给定的年份中，任何月份都包含1，而上一年的任何月份都包含1，我希望今年的月份都是

冲突_mutated

的1，同时保留所有旧的1

因此，如果我们有如下数据：

year month id conflict
1989 6     33 0
1989 7     33 0
1989 8     33 1
1989 9     33 1
1989 10    33 1
1989 11    33 1
1989 12    33 1
1990 1     33 0
1990 3     33 0
1990 3     33 0
1990 4     33 0
1990 5     33 1
1990 6     33 1
1990 7     33 1
1990 8     33 1
1990 9     33 1
1990 10    33 1
1990 11    33 1
1990 12    33 1

因此，我希望在第1、2、3和4个月内，

conlfict

中的0是1，因为它们是相同的id，1989年（上一年）和1990年都有1。前面的示例数据如下所示：

year month id conflict conflict_mutated
1989 6     33 0        0
1989 7     33 0        0
1989 8     33 1        1
1989 9     33 1        1
1989 10    33 1        1
1989 11    33 1        1
1989 12    33 1        1
1990 1     33 0        1
1990 3     33 0        1
1990 3     33 0        1
1990 4     33 0        1
1990 5     33 1        1
1990 6     33 1        1
1990 7     33 1        1
1990 8     33 1        1
1990 9     33 1        1
1990 10    33 1        1
1990 11    33 1        1
1990 12    33 1        1

我有一个解决方案，但它需要将近3天才能完成。详情如下:

conflict_mutated = df$conflict

for (i in 1:length(nrow(df)) {
  if (df$year[i] != 1989 & any(filter(df, id == df$id[i], 
    year == (df$year[i] - 1))$conflict == 1) & 
    any(filter(df, id == df$id[i], year == df$year[i])$conflict == 1)) 
        {conflict_mutated[i] = 1}

有没有办法利用group_by和mutate使其更快或更好？考虑到分组年份，在考虑如何实现这一点时遇到困难，必须将其考虑在内，并在与不同id相结合的条件逻辑中进行转换。

foo%dplyr:：summary（yrtot=sum（conflict））
foo  <- read_csv('df1.csv')
#print(foo, n =40)
## A tibble: 40 x 4
#    year month    id conflict
#   <int> <int> <int>    <int>
# 1  1989     6    33        0
# 2  1989     7    33        0
# 3  1989     8    33        1
# 4  1989     9    33        1
# 5  1989    10    33        1
# 6  1989    11    33        1
# 7  1989    12    33        1
# 8  1990     1    33        0
# 9  1990     3    33        0
#10  1990     3    33        0
#11  1990     4    33        0
#12  1990     5    33        1
#13  1990     6    33        1
#14  1990     7    33        1
#15  1990     8    33        1
#16  1990     9    33        1
#17  1990    10    33        1
#18  1990    11    33        1
#19  1990    12    33        1
#20  1991     1    33        0
#21  1989     6    34        0
#22  1989     7    34        0
#23  1989     8    34        1
#24  1989     9    34        1
#25  1989    10    34        1
#26  1989    11    34        1
#27  1989    12    34        1
#28  1990     1    34        0
#29  1990     3    34        0
#30  1990     3    34        0
#31  1990     4    34        0
#32  1990     5    34        1
#33  1990     6    34        1
#34  1990     7    34        1
#35  1990     8    34        1
#36  1990     9    34        1
#37  1990    10    34        1
#38  1990    11    34        1
#39  1990    12    34        1
#40  1991     1    34        0
bar  <-  foo %>% group_by(id, year) %>% dplyr::summarize(yrtot = sum(conflict))
library(data.table)
bar  %<>% ungroup() %>% group_by(id)  %>%  dplyr::mutate(lastyrtot=shift(yrtot, n=1))
foo  %<>%  left_join( bar)  %>% 
        dplyr::mutate(conflict_mutate = ifelse(yrtot>1 & lastyrtot >1,1,0) )
foo %<>% dplyr::mutate(conflict_mutate  =  ifelse(is.na(lastyrtot), conflict, conflict_mutate)) %>% select(-yrtot, -lastyrtot) 

#R> print(foo, n=40)
## A tibble: 40 x 5
#    year month    id conflict conflict_mutate
#   <int> <int> <int>    <int>           <dbl>
# 1  1989     6    33        0               0
# 2  1989     7    33        0               0
# 3  1989     8    33        1               1
# 4  1989     9    33        1               1
# 5  1989    10    33        1               1
# 6  1989    11    33        1               1
# 7  1989    12    33        1               1
# 8  1990     1    33        0               1
# 9  1990     3    33        0               1
#10  1990     3    33        0               1
#11  1990     4    33        0               1
#12  1990     5    33        1               1
#13  1990     6    33        1               1
#14  1990     7    33        1               1
#15  1990     8    33        1               1
#16  1990     9    33        1               1
#17  1990    10    33        1               1
#18  1990    11    33        1               1
#19  1990    12    33        1               1
#20  1991     1    33        0               0
#21  1989     6    34        0               0
#22  1989     7    34        0               0
#23  1989     8    34        1               1
#24  1989     9    34        1               1
#25  1989    10    34        1               1
#26  1989    11    34        1               1
#27  1989    12    34        1               1
#28  1990     1    34        0               1
#29  1990     3    34        0               1
#30  1990     3    34        0               1
#31  1990     4    34        0               1
#32  1990     5    34        1               1
#33  1990     6    34        1               1
#34  1990     7    34        1               1
#35  1990     8    34        1               1
#36  1990     9    34        1               1
#37  1990    10    34        1               1
#38  1990    11    34        1               1
#39  1990    12    34        1               1
#40  1991     1    34        0               0

库（数据表）
bar%%ungroup（）%%>%group_by（id）%%>%dplyr:：mutate（lastyrtot=shift（yrtot，n=1））
foo%%左切圆（条形）%%>%
dplyr:：mutate（冲突_mutate=ifelse（yrtot>1&lastyrtot>1,1,0））
foo%%dplyr:：mutate（conflict\u mutate=ifelse（is.na（lastyrtot），conflict，conflict\u mutate））%%>%select（-yrtot，-lastyrtot）
#R> 打印（foo，n=40）
##一个tibble:40x5
#年-月id冲突\u变异
#                    
# 1  1989     6    33        0               0
# 2  1989     7    33        0               0
# 3  1989     8    33        1               1
# 4  1989     9    33        1               1
# 5  1989    10    33        1               1
# 6  1989    11    33        1               1
# 7  1989    12    33        1               1
# 8  1990     1    33        0               1
# 9  1990     3    33        0               1
#10  1990     3    33        0               1
#11  1990     4    33        0               1
#12  1990     5    33        1               1
#13  1990     6    33        1               1
#14  1990     7    33        1               1
#15  1990     8    33        1               1
#16  1990     9    33        1               1
#17  1990    10    33        1               1
#18  1990    11    33        1               1
#19  1990    12    33        1               1
#20  1991     1    33        0               0
#21  1989     6    34        0               0
#22  1989     7    34        0               0
#23  1989     8    34        1               1
#24  1989     9    34        1               1
#25  1989    10    34        1               1
#26  1989    11    34        1               1
#27  1989    12    34        1               1
#28  1990     1    34        0               1
#29  1990     3    34        0               1
#30  1990     3    34        0               1
#31  1990     4    34        0               1
#32  1990     5    34        1               1
#33  1990     6    34        1               1
#34  1990     7    34        1               1
#35  1990     8    34        1               1
#36  1990     9    34        1               1
#37  1990    10    34        1               1
#38  1990    11    34        1               1
#39  1990    12    34        1               1
#40  1991     1    34        0               0

谢谢您的回复！数据表中可能不清楚，但冲突可能在明年结束，因此并非所有未来月份都编码为1。换句话说，如果上一年和当前一年都包含1，我只想在

conflict\u mutated

中更改0。因此，并非所有未来月份都应改为1，因为有些年份可能一开始就没有1。另外，我不完全确定，但我认为您的解决方案没有考虑到不同的

id

s。我使用dplyr和data.table工具重新编写了我的答案。我稍微扩展了数据集，将“未来”1991年的观测结果和第二个“id”结合起来。