如何在R中的每个组中选择“x”最新值?
我试图在R中的数据框中选择/筛选每个组中的最新值。例如,我想从以下数据框中的每个名称组中选择3个最新值,即最接近今天的日期:如何在R中的每个组中选择“x”最新值?,r,greatest-n-per-group,R,Greatest N Per Group,我试图在R中的数据框中选择/筛选每个组中的最新值。例如,我想从以下数据框中的每个名称组中选择3个最新值,即最接近今天的日期: Player Date Result Sam 03/15/2015 1 Sam 03/22/2015 0 Sam 04/04/2015 2 Sam 04/12/2015 1 Sam 04/18/2015 1 Sam 04/26/2015 0 Sam 08/08/2015 3 Steve 02
Player Date Result
Sam 03/15/2015 1
Sam 03/22/2015 0
Sam 04/04/2015 2
Sam 04/12/2015 1
Sam 04/18/2015 1
Sam 04/26/2015 0
Sam 08/08/2015 3
Steve 02/17/2015 0
Steve 02/21/2015 0
Steve 03/04/2015 4
Steve 03/11/2015 2
Steve 03/15/2015 1
Steve 03/22/2015 0
Steve 04/12/2015 0
Steve 04/18/2015 2
Steve 04/26/2015 1
Steve 04/29/2015 2
Steve 08/16/2015 4
Jasper 03/15/2015 3
Jasper 03/22/2015 3.5
Jasper 04/04/2015 4
Jasper 04/12/2015 4
Jasper 04/18/2015 5
Jasper 04/26/2015 0
我已经编写了as.date代码,因此R现在可以理解日期格式,但我现在可以使用什么代码只选择每组中的3个最新值 我们可以使用dplyr。我们使用as.Date将“Date”类转换为Date类。按“Player”分组后,我们将“Date”列向下排列,并使用slice获取最近的3个值。如果不想更改“Date”类,可以删除mutate步骤,并在arrange中进行转换,即arrangedescas.datedatedate,“%m/%d/%Y”
或者在我们按“Player”分组后,我们可以通过指定“n”和“wt”变量来使用top\n进行排序
df1 %>%
mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
group_by(Player) %>%
top_n(n = 3, Date)
# Player Date Result
#1 Sam 2015-04-18 1
#2 Sam 2015-04-26 0
#3 Sam 2015-08-08 3
#4 Steve 2015-04-26 1
#5 Steve 2015-04-29 2
#6 Steve 2015-08-16 4
#7 Jasper 2015-04-12 4
#8 Jasper 2015-04-18 5
#9 Jasper 2015-04-26 0
使用data.table,我们将“data.frame”转换为“data.table”setDTdf1。按照“玩家”分组,我们在转换为日期类后,对“日期”进行排序,并使用头部可以得到每组的前3行
library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
# Player Date Result
#1: Steve 08/16/2015 4
#2: Steve 04/29/2015 2
#3: Steve 04/26/2015 1
#4: Sam 08/08/2015 3
#5: Sam 04/26/2015 0
#6: Sam 04/18/2015 1
#7: Jasper 04/26/2015 0
#8: Jasper 04/18/2015 5
#9: Jasper 04/12/2015 4
数据
你自己已经试过什么了?为什么不起作用?这个解决方案的可能副本看起来很棒,非常感谢!我将能够在几天内正确地测试它,并将报告给您确认。@WillT-E谢谢。很高兴帮助你。我和高层一起去的,效果很好。好的是,如果其中一个组长度小于“n”,它不会给出错误消息。“它只包括对该组可用的任何观察结果。”WillT-E感谢您的反馈。slice也应该以这种方式工作。例如,df1%>%mutateDate=as.DateDate,'%m/%d/%Y%%>%group\U byPlayer%>%arrangedescDate%>%1:7
library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
# Player Date Result
#1: Steve 08/16/2015 4
#2: Steve 04/29/2015 2
#3: Steve 04/26/2015 1
#4: Sam 08/08/2015 3
#5: Sam 04/26/2015 0
#6: Sam 04/18/2015 1
#7: Jasper 04/26/2015 0
#8: Jasper 04/18/2015 5
#9: Jasper 04/12/2015 4
df1 <- structure(list(Player = c("Sam", "Sam", "Sam", "Sam", "Sam",
"Sam", "Sam", "Steve", "Steve", "Steve", "Steve", "Steve", "Steve",
"Steve", "Steve", "Steve", "Steve", "Steve", "Jasper", "Jasper",
"Jasper", "Jasper", "Jasper", "Jasper"), Date = c("03/15/2015",
"03/22/2015", "04/04/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"08/08/2015", "02/17/2015", "02/21/2015", "03/04/2015", "03/11/2015",
"03/15/2015", "03/22/2015", "04/12/2015", "04/18/2015", "04/26/2015",
"04/29/2015", "08/16/2015", "03/15/2015", "03/22/2015", "04/04/2015",
"04/12/2015", "04/18/2015", "04/26/2015"), Result = c(1, 0, 2,
1, 1, 0, 3, 0, 0, 4, 2, 1, 0, 0, 2, 1, 2, 4, 3, 3.5, 4, 4, 5,
0)), .Names = c("Player", "Date", "Result"),
class = "data.frame", row.names = c(NA, -24L))