R：最新日期的子集_R_Subset - Fatal编程技术网

R：最新日期的子集

R：最新日期的子集,r,subset,R,Subset,我有：我需要子集，以便仅保留具有最新日期的行： Keyword Date Pos Bid a 4/11/14 1 5.00 a 4/13/14 1 5.00 a 4/14/14 1 5.00 b 6/2/14 3 9.00 b 7/2/14 4 9.00 b 8/2/14 4 9.00 c 8/29/14 2 3.00 c

我有：

我需要子集，以便仅保留具有最新日期的行：

Keyword   Date   Pos   Bid
a       4/11/14   1   5.00
a       4/13/14   1   5.00
a       4/14/14   1   5.00
b        6/2/14   3   9.00
b        7/2/14   4   9.00  
b        8/2/14   4   9.00
c       8/29/14   2   3.00
c       8/30/14   2   3.00
c       8/31/14   2   3.00

我试过：

Keyword   Date   Pos   Bid
a       4/14/14   1   5.00
b        8/2/14   4   9.00
c       8/31/14   2   3.00

及

但这些要么给我错误，要么不是我想要的。我错过了什么

谢谢。

你可以试试

Latest = subset( x, 
                 Date = max(as.Date(Date, '%m/%d/%y')), 
                 select = c('Identity', 'Date', 'Round.Avg.Pos.', 'Search.Bid')
         )

或

或使用

base R

   df %>% 
      group_by(Keyword) %>%
      mutate(Date=as.Date(Date, format= "%m/%d/%y"))%>% 
      filter(Date==max(Date))

数据

df或使用数据表
df <- structure(list(Keyword = c("a", "a", "a", "b", "b", "b", "c", 
 "c", "c"), Date = c("4/11/14", "4/13/14", "4/14/14", "6/2/14", 
 "7/2/14", "8/2/14", "8/29/14", "8/30/14", "8/31/14"), Pos = c(1L, 
1L, 1L, 3L, 4L, 4L, 2L, 2L, 2L), Bid = c(5, 5, 5, 9, 9, 9, 3, 
3, 3)), .Names = c("Keyword", "Date", "Pos", "Bid"), class = "data.frame", row.names = c(NA, 
-9L))


下面是使用“拆分-应用-合并”方法的附加base R解决方案
library(data.table)
setDT(df)[ ,.SD[which.max(as.Date(Date, format= "%m/%d/%y"))], by = Keyword]
#    Keyword    Date Pos Bid
# 1:       a 4/14/14   1   5
# 2:       b  8/2/14   4   9
# 3:       c 8/31/14   2   3

注意：您想要的输出是以与以前相同的格式保留Date
列，因此我在两种解决方案的每次迭代中都应用as.Date
，而最佳做法是将其转换为Date
类一次，然后在聚合过程中使用已转换的列
尝试：
do.call(rbind, lapply(split(df, df$Keyword), 
        function(x) x[which.max(as.Date(x$Date, format='%m/%d/%y')), ]))
#   Keyword    Date Pos Bid
# a       a 4/14/14   1   5
# b       b  8/2/14   4   9
# c       c 8/31/14   2   3

@jazzurro这是基于你之前的解决方案。是的，我可以看到。：）可能在分组之前发生变异？
  ddply(df, .(Keyword), function(x) {
                  Date=as.Date(x$Date, '%m/%d/%y')
                  x[Date==max(Date),]})

  #  Keyword    Date Pos Bid
  #1       a 4/14/14   1   5
  #2       b  8/2/14   4   9
  #3       c 8/31/14   2   3

df <- structure(list(Keyword = c("a", "a", "a", "b", "b", "b", "c", 
 "c", "c"), Date = c("4/11/14", "4/13/14", "4/14/14", "6/2/14", 
 "7/2/14", "8/2/14", "8/29/14", "8/30/14", "8/31/14"), Pos = c(1L, 
1L, 1L, 3L, 4L, 4L, 2L, 2L, 2L), Bid = c(5, 5, 5, 9, 9, 9, 3, 
3, 3)), .Names = c("Keyword", "Date", "Pos", "Bid"), class = "data.frame", row.names = c(NA, 
-9L))

library(data.table)
setDT(df)[ ,.SD[which.max(as.Date(Date, format= "%m/%d/%y"))], by = Keyword]
#    Keyword    Date Pos Bid
# 1:       a 4/14/14   1   5
# 2:       b  8/2/14   4   9
# 3:       c 8/31/14   2   3

do.call(rbind, lapply(split(df, df$Keyword), 
        function(x) x[which.max(as.Date(x$Date, format='%m/%d/%y')), ]))
#   Keyword    Date Pos Bid
# a       a 4/14/14   1   5
# b       b  8/2/14   4   9
# c       c 8/31/14   2   3

ddf$Date = as.Date(ddf$Date, format("%m/%d/%y"))
ddf= ddf[rev(order(ddf$Date)),]
ddf = ddf[!duplicated(ddf$Keyword),]
ddf[order(ddf$Keyword),]
  Keyword       Date Pos Bid
3       a 2014-04-14   1   5
6       b 2014-08-02   4   9
9       c 2014-08-31   2   3