如何在R中选择组内两个变量组合上具有特定值的行_R_Function_Dplyr

如何在R中选择组内两个变量组合上具有特定值的行

r function

如何在R中选择组内两个变量组合上具有特定值的行,r,function,dplyr,R,Function,Dplyr,这是我之前问过的R问题的扩展：在这个问题上我得到了很大的帮助，但现在事情变得有点复杂了，我希望能收到如何处理这个问题的建议我的数据如下所示： dd <- read.table(text=" event.timeline.ys ID year group outcome 1 2 800033 2008 A 3 2

这是我之前问过的R问题的扩展：

在这个问题上我得到了很大的帮助，但现在事情变得有点复杂了，我希望能收到如何处理这个问题的建议

我的数据如下所示：

dd <- read.table(text="
    event.timeline.ys     ID     year    group  outcome
                 1                   2     800033 2008    A  3
                 2                   1     800033 2009    A  3
                 3                   0     800033 2010    A  NA   
                 4                  -1     800033 2011    A  2  
                 5                  -2     800033 2012    A  1  
                 15                  0     800076 2008    B  2
                 16                 -1     800076 2009    B  NA
                 17                  5     800100 2014    C  4     
                 18                  4     800100 2015    C  4  
                 19                  2     800100 2017    C  4  
                 20                  1     800100 2018    C  3   
                 30                  0     800125 2008    A  2   
                 31                 -1     800125 2009    A  1   
                 32                 -2     800125 2010    A  NA
                 33                  2     800031 2008    A  3
                 34                  1     800031 2009    A  3
                 35                  0     800031 2010    A  NA   
                 36                 -1     800031 2011    A  NA  
                 37                 -2     800031 2012    A  1", header=TRUE)

      event.timeline.ys         ID     year    group  outcome
2                     1     800033     2009    A            3
4                    -1     800033     2011    A            2  
15                    0     800076     2008    B            2
16                   -1     800076     2009    B           NA
20                    1     800100     2018    C            3   
30                    0     800125     2008    A            2   
31                   -1     800125     2009    A            1
34                    1     800031     2009    A            3
37                   -2     800031     2012    A            1

我非常感谢关于如何解决这个问题的建议。我已经试过了：

dd %>% 
  group_by(ID) %>% 
  filter(row_number() == last(which(event.timeline.ys >= 0 & outcome >= 0)) | 
           row_number() == first(which(event.timeline.ys < 0 & outcome >= 0)))

dd%>%
分组依据（ID）%>%
过滤器（行号（）==last（其中（event.timeline.ys>=0&output>=0））
行号（）==第一行（其中（event.timeline.ys<0&outcome>=0））

然而，我随后丢失了第16行（对于ID==800076），这是不幸的

非常感谢

以下是使用

dplyr

和

wrapr

的管道

%.>%

的解决方案。我正在添加

结果\u na

，并通过它来安排满足条件“没有任何非na值”

库（dplyr）
图书馆（wrapr）
dd%>%
分组依据（ID）%>%
突变（结果_na=！is.na（结果））%。>%
绑定行(
筛选器（，event.timeline.ys>=0）%%>%arrange（结果，年份）%%>%slice（n（）），
过滤器（，event.timeline.ys<0）%%>%排列（描述（结果），年份）%%>%切片（1）
) %>%
排列（ID）%>%
选择（-outcome\u na）

使用

dplyr

：

dd %>%
group_by(ID, event.timeline.ys>=0) %>%
arrange(ID, event.timeline.ys>=0, abs(event.timeline.ys)) %>%
filter(!is.na(outcome) | n()==1) %>%
filter(row_number()==1) %>%
ungroup() %>%
select(-one_of('event.timeline.ys >= 0'))

输出：

  event.timeline.ys     ID  year group outcome
              <int>  <int> <int> <fct>   <int>
1                -1 800033  2011 A           2
2                 1 800033  2009 A           3
3                -1 800076  2009 B          NA
4                 0 800076  2008 B           2
5                 1 800100  2018 C           3
6                -1 800125  2009 A           1
7                 0 800125  2008 A           2

event.timeline.ys ID年份组结果
1-1 800033 2011 A 2
2 1 800033 2009 A 3
3-1800076 2009 B不适用
4 0 800076 2008 B 2
5 1 800100 2018 C 3
6-1 800125 2009 A 1
7 0 800125 2008 A 2

只需使用

数据与我之前的答案保持一致。表

我们可以使用

ifelse

条件来选择行

library(data.table)
setDT(dd)
dd[, .SD[na.omit(c(ifelse(any(event.timeline.ys >= 0 & !is.na(outcome)),
                          last(which(event.timeline.ys >= 0 & !is.na(outcome))), 
                          last(which(event.timeline.ys >= 0))),
                   ifelse(any(event.timeline.ys < 0 & !is.na(outcome)),
                          first(which(event.timeline.ys < 0 & !is.na(outcome))), 
                          first(which(event.timeline.ys < 0)))))],
   by=ID]


       ID event.timeline.ys year group outcome
1: 800033                 1 2009     A       3
2: 800033                -1 2011     A       2
3: 800076                 0 2008     B       2
4: 800076                -1 2009     B      NA
5: 800100                 1 2018     C       3
6: 800125                 0 2008     A       2
7: 800125                -1 2009     A       1
8: 800031                 1 2009     A       3
9: 800031                -2 2012     A       1

库（data.table）
setDT（dd）
dd[，.SD[na.omit（c）（如果其他（event.timeline.ys>=0&！is.na（结果）），
最后一个（event.timeline.ys>=0&！is.na（结果）），
最后一个（即（event.timeline.ys>=0）），
如果其他（任何（event.timeline.ys<0&！is.na（结果）），
首先（event.timeline.ys<0&！is.na（outcome）），
第一个（event.timeline.ys<0()())]，
by=ID]
ID event.timeline.ys年组结果
1:800033 1 2009 A 3
2:800033-1 2011 A 2
3:80007620008B2
4:800076-1 2009 B不适用
5:800100 1 2018 C 3
6:800125008A2
7:800125-1 2009 A 1
8:800031 1 2009 A 3
9:800031-2 2012 A 1

非常感谢！我真的很感谢你的帮助。我喜欢看到有不同的方法达到相同的结果。对于不太熟悉管道、函数和循环的人（像我一样，我刚刚开始在R中训练自己），这看起来也是一个清晰明了的解决方案！嘿，我在上面的数据示例中添加了另一个人（ID==800031）。使用您的代码，我将得到第34行（这是正确的）和第36行。然而，在第36行中，该个体的结果变量为NA。我想要第37排（event.timeline.ys上具有负值的第一行，该行的结果变量也具有有效值。我必须如何调整您的代码才能获得此值？@marib。您还必须通过

output\u na

排列

event.timeline.ys<0

，但这次是按降序排列的，因为您希望的不是最后一行，而是第一行。

library(data.table)
setDT(dd)
dd[, .SD[na.omit(c(ifelse(any(event.timeline.ys >= 0 & !is.na(outcome)),
                          last(which(event.timeline.ys >= 0 & !is.na(outcome))), 
                          last(which(event.timeline.ys >= 0))),
                   ifelse(any(event.timeline.ys < 0 & !is.na(outcome)),
                          first(which(event.timeline.ys < 0 & !is.na(outcome))), 
                          first(which(event.timeline.ys < 0)))))],
   by=ID]


       ID event.timeline.ys year group outcome
1: 800033                 1 2009     A       3
2: 800033                -1 2011     A       2
3: 800076                 0 2008     B       2
4: 800076                -1 2009     B      NA
5: 800100                 1 2018     C       3
6: 800125                 0 2008     A       2
7: 800125                -1 2009     A       1
8: 800031                 1 2009     A       3
9: 800031                -2 2012     A       1