R 删除不满足给定条件的行

R 删除不满足给定条件的行,r,dplyr,R,Dplyr,我的数据帧dput如下所示: lf3 = structure(list(session_id = c(1L, 1L, 1L, 2L, 3L, 5L, 5L, 6L, 6L, 7L), userId = c(1, 1, 1, 2, 2, 4, 4, 5, 5, 5), datetime = structure(c(1457029336, 1457029337, 1457029340, 1457029596, 1457313569, 1457030783, 1457030784, 14570

我的数据帧dput如下所示:

lf3 = structure(list(session_id = c(1L, 1L, 1L, 2L, 3L, 5L, 5L, 6L, 
6L, 7L), userId = c(1, 1, 1, 2, 2, 4, 4, 5, 5, 5), datetime = 
structure(c(1457029336, 
1457029337, 1457029340, 1457029596, 1457313569, 1457030783, 1457030784, 
1457030918, 1457030920, 1457370365), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), referer = c(22, 2, 7, 5, 23, 20, 7, 24, 18, 
22), request = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 5)), .Names = c("session_id", 
"userId", "datetime", "referer", "request"), row.names = c(NA, 
10L), class = "data.frame")
现在我想退出那些具有最小指定标准/值的会话。 我尝试以下代码:

lf3%>%groupby(session\u id)%%>%tally(sort=TRUE)%%>%filter(n>2)

但我希望返回相同的数据帧,只有会话通过此条件,如下所示:

  session_id userId            datetime referer request
1          1      1 2016-03-03 18:22:16      22       1
2          1      1 2016-03-03 18:22:17       2       2
3          1      1 2016-03-03 18:22:20       7       3

如何处理该问题

您可能需要
分组依据%>%filter

lf3 %>% group_by(session_id) %>% filter(n() > 2)

# A tibble: 3 x 5
# Groups:   session_id [1]
#  session_id userId            datetime referer request
#       <int>  <dbl>              <dttm>   <dbl>   <dbl>
#1          1      1 2016-03-03 18:22:16      22       1
#2          1      1 2016-03-03 18:22:17       2       2
#3          1      1 2016-03-03 18:22:20       7       3
lf3%>%groupby(会话id)%>%filter(n()>2)
#一个tibble:3x5
#分组:会话号[1]
#会话\u id用户id日期时间引用程序请求
#                             
#1          1      1 2016-03-03 18:22:16      22       1
#2          1      1 2016-03-03 18:22:17       2       2
#3          1      1 2016-03-03 18:22:20       7       3

我们可以使用
数据表

library(data.table)
setDT(lf3)[, if(.N >2) .SD, session_id]
#      session_id userId            datetime referer request
#1:          1      1 2016-03-03 18:22:16      22       1
#2:          1      1 2016-03-03 18:22:17       2       2
#3:          1      1 2016-03-03 18:22:20       7       3

用预期的输出更新您的问题。这样它将只给出会话id=1行,其频率大于2。预期的输出将类似于以下框架:
structure(列表(会话id=c(1L,1L,1L),userId=c(1,1,1),datetime=structure(c(1457029336,1457029337,1457029340),class=c(“POSIXct”,“POSIXt”),tzone=“UTC”),referer=c(22,2,7),request=c(1,2,3)),.Names=c(“session_id”,“userId”,“datetime”,“referer”,“request”),row.Names=c(NA,3L),class=“data.frame”)
我更喜欢base R,
ave
lf3[ave(lf3$userId,lf3$session_id,FUN=length)>2,]
ok正常。这是TIBLE,因此我将其转换为数据帧,并将其另存为一个数据帧变量名。谢谢。使用此方法时,应检查性能。