R 筛选组中给定行元素下方的行
我有一个数据框架,每一行都有一组日期事件链接到一个位置。在每个位置中,我都有一个索引事件和一系列可能在索引事件之前和/或之后发生的各种匹配事件。我需要为每个位置的索引事件之前发生的所有匹配事件子集。数据结构如下所示R 筛选组中给定行元素下方的行,r,R,我有一个数据框架,每一行都有一组日期事件链接到一个位置。在每个位置中,我都有一个索引事件和一系列可能在索引事件之前和/或之后发生的各种匹配事件。我需要为每个位置的索引事件之前发生的所有匹配事件子集。数据结构如下所示 locid match date score iid 1 index 4/11/2013 15 1 1 matched 1/09/2013 23 2
locid match date score iid
1 index 4/11/2013 15 1
1 matched 1/09/2013 23 2
1 matched 14/04/2013 1 3
1 matched 7/1/2014 21 4
2 index 2/4/2013 12 1
2 matched 1/2/2013 10 2
3 index 1/5/2013 23 1
3 matched 2/5/2013 10 2
4 index 3/3/2013 9 1
4 matched 10/2/2013 32 2
4 matched 1/10/2012 15 3
4 matched 4/3/2013 12 4
4 matched 10/3/2013 10 5
我需要对数据帧进行子集划分,这样我只会得到日期低于每个位置的索引事件日期的行:
locid match date score iid
1 matched 1/09/2013 23 2
1 matched 14/04/2013 1 3
2 matched 1/2/2013 10 2
4 matched 10/2/2013 32 2
4 matched 1/10/2012 15 3
我第一次问这个问题,所以我希望我没有做错。我尝试了R中的各种解决方案排列,但我正在努力找到正确的解决方案。以下是使用dplyr的方法:
require(dplyr)
df %>%
mutate(date = as.Date(date, format = "%d/%m/%Y")) %>%
group_by(locid) %>%
filter(match == "matched" & date < date[match == "index"])
#Source: local data frame [5 x 5]
#Groups: locid
#
# locid match date score iid
#1 1 matched 2013-09-01 23 2
#2 1 matched 2013-04-14 1 3
#3 2 matched 2013-02-01 10 2
#4 4 matched 2013-02-10 32 2
#5 4 matched 2012-10-01 15 3
这是一个
数据表
可能性(假设您的数据名为df
)
库(data.table)
setDT(df)[,date:=as.date(date,format=“%d/%m/%Y”)],
.SD[date
可能的基R解
df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, by(df, df$locid, FUN = function(x) x[with(x, date < date[match == "index"]), ]))
# locid match date score iid
# 1.2 1 matched 2013-09-01 23 2
# 1.3 1 matched 2013-04-14 1 3
# 2 2 matched 2013-02-01 10 2
# 4.10 4 matched 2013-02-10 32 2
# 4.11 4 matched 2012-10-01 15 3
df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, lapply(split(df, df$locid), function(x) x[with(x, date < date[match == "index"]), ]))
# locid match date score iid
# 1.2 1 matched 2013-09-01 23 2
# 1.3 1 matched 2013-04-14 1 3
# 2 2 matched 2013-02-01 10 2
# 4.10 4 matched 2013-02-10 32 2
# 4.11 4 matched 2012-10-01 15 3
df谢谢!这是一种享受。我试图做一些类似的事情,但是在过滤函数中做了一个令人尴尬的愚蠢评论。非常感谢。看到不同的选项非常有帮助。这提醒了我,我真的需要掌握data.table包。正如我在上两个选项中所说明的,您可以不使用任何包来完成它
df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, by(df, df$locid, FUN = function(x) x[with(x, date < date[match == "index"]), ]))
# locid match date score iid
# 1.2 1 matched 2013-09-01 23 2
# 1.3 1 matched 2013-04-14 1 3
# 2 2 matched 2013-02-01 10 2
# 4.10 4 matched 2013-02-10 32 2
# 4.11 4 matched 2012-10-01 15 3
df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, lapply(split(df, df$locid), function(x) x[with(x, date < date[match == "index"]), ]))
# locid match date score iid
# 1.2 1 matched 2013-09-01 23 2
# 1.3 1 matched 2013-04-14 1 3
# 2 2 matched 2013-02-01 10 2
# 4.10 4 matched 2013-02-10 32 2
# 4.11 4 matched 2012-10-01 15 3