R:最新日期的子集
我有: 我需要子集,以便仅保留具有最新日期的行:R:最新日期的子集,r,subset,R,Subset,我有: 我需要子集,以便仅保留具有最新日期的行: Keyword Date Pos Bid a 4/11/14 1 5.00 a 4/13/14 1 5.00 a 4/14/14 1 5.00 b 6/2/14 3 9.00 b 7/2/14 4 9.00 b 8/2/14 4 9.00 c 8/29/14 2 3.00 c
Keyword Date Pos Bid
a 4/11/14 1 5.00
a 4/13/14 1 5.00
a 4/14/14 1 5.00
b 6/2/14 3 9.00
b 7/2/14 4 9.00
b 8/2/14 4 9.00
c 8/29/14 2 3.00
c 8/30/14 2 3.00
c 8/31/14 2 3.00
我试过:
Keyword Date Pos Bid
a 4/14/14 1 5.00
b 8/2/14 4 9.00
c 8/31/14 2 3.00
及
但这些要么给我错误,要么不是我想要的。我错过了什么
谢谢。你可以试试
Latest = subset( x,
Date = max(as.Date(Date, '%m/%d/%y')),
select = c('Identity', 'Date', 'Round.Avg.Pos.', 'Search.Bid')
)
或
或使用base R
df %>%
group_by(Keyword) %>%
mutate(Date=as.Date(Date, format= "%m/%d/%y"))%>%
filter(Date==max(Date))
数据
df或使用数据表
df <- structure(list(Keyword = c("a", "a", "a", "b", "b", "b", "c",
"c", "c"), Date = c("4/11/14", "4/13/14", "4/14/14", "6/2/14",
"7/2/14", "8/2/14", "8/29/14", "8/30/14", "8/31/14"), Pos = c(1L,
1L, 1L, 3L, 4L, 4L, 2L, 2L, 2L), Bid = c(5, 5, 5, 9, 9, 9, 3,
3, 3)), .Names = c("Keyword", "Date", "Pos", "Bid"), class = "data.frame", row.names = c(NA,
-9L))
下面是使用“拆分-应用-合并”方法的附加base R解决方案
library(data.table)
setDT(df)[ ,.SD[which.max(as.Date(Date, format= "%m/%d/%y"))], by = Keyword]
# Keyword Date Pos Bid
# 1: a 4/14/14 1 5
# 2: b 8/2/14 4 9
# 3: c 8/31/14 2 3
注意:您想要的输出是以与以前相同的格式保留Date
列,因此我在两种解决方案的每次迭代中都应用as.Date
,而最佳做法是将其转换为Date
类一次,然后在聚合过程中使用已转换的列尝试:
do.call(rbind, lapply(split(df, df$Keyword),
function(x) x[which.max(as.Date(x$Date, format='%m/%d/%y')), ]))
# Keyword Date Pos Bid
# a a 4/14/14 1 5
# b b 8/2/14 4 9
# c c 8/31/14 2 3
@jazzurro这是基于你之前的解决方案。是的,我可以看到。:)可能在分组之前发生变异?
ddply(df, .(Keyword), function(x) {
Date=as.Date(x$Date, '%m/%d/%y')
x[Date==max(Date),]})
# Keyword Date Pos Bid
#1 a 4/14/14 1 5
#2 b 8/2/14 4 9
#3 c 8/31/14 2 3
df <- structure(list(Keyword = c("a", "a", "a", "b", "b", "b", "c",
"c", "c"), Date = c("4/11/14", "4/13/14", "4/14/14", "6/2/14",
"7/2/14", "8/2/14", "8/29/14", "8/30/14", "8/31/14"), Pos = c(1L,
1L, 1L, 3L, 4L, 4L, 2L, 2L, 2L), Bid = c(5, 5, 5, 9, 9, 9, 3,
3, 3)), .Names = c("Keyword", "Date", "Pos", "Bid"), class = "data.frame", row.names = c(NA,
-9L))
library(data.table)
setDT(df)[ ,.SD[which.max(as.Date(Date, format= "%m/%d/%y"))], by = Keyword]
# Keyword Date Pos Bid
# 1: a 4/14/14 1 5
# 2: b 8/2/14 4 9
# 3: c 8/31/14 2 3
do.call(rbind, lapply(split(df, df$Keyword),
function(x) x[which.max(as.Date(x$Date, format='%m/%d/%y')), ]))
# Keyword Date Pos Bid
# a a 4/14/14 1 5
# b b 8/2/14 4 9
# c c 8/31/14 2 3
ddf$Date = as.Date(ddf$Date, format("%m/%d/%y"))
ddf= ddf[rev(order(ddf$Date)),]
ddf = ddf[!duplicated(ddf$Keyword),]
ddf[order(ddf$Keyword),]
Keyword Date Pos Bid
3 a 2014-04-14 1 5
6 b 2014-08-02 4 9
9 c 2014-08-31 2 3