R:基于另一个数据表的子集
假设我有两个数据表R:基于另一个数据表的子集,r,datetime,join,data.table,subset,R,Datetime,Join,Data.table,Subset,假设我有两个数据表dm和dn: library(data.table) set.seed(12) dates = seq.Date(as.Date('2015-09-01'),as.Date('2015-11-01'), 2) dm = data.table(user=sample(LETTERS[1:4], 10, replace=T), time=sample(dates, 10)) dn = data.table(user=sample(LETTERS[
dm
和dn
:
library(data.table)
set.seed(12)
dates = seq.Date(as.Date('2015-09-01'),as.Date('2015-11-01'), 2)
dm = data.table(user=sample(LETTERS[1:4], 10, replace=T),
time=sample(dates, 10))
dn = data.table(user=sample(LETTERS[1:8], 3, replace=F),
start=c(as.Date('2015-09-01'), as.Date('2015-10-05'),
as.Date('2015-09-14')),
end=c(as.Date('2015-10-30'), as.Date('2015-11-01'),
as.Date('2015-10-20')))
>dm
# user time
# 1: A 2015-09-25
# 2: D 2015-10-19
# 3: D 2015-09-21
# 4: B 2015-10-27
# 5: A 2015-09-15
# 6: A 2015-09-23
# 7: A 2015-10-21
# 8: C 2015-10-31
# 9: A 2015-10-01
# 10: A 2015-09-05
>dn
# user start end
# 1: B 2015-09-01 2015-10-30
# 2: F 2015-10-05 2015-11-01
# 3: A 2015-09-14 2015-10-20
如何根据dn
的列对dm
进行子集划分?例如,对于dn
中的每个用户,我们在dm
中查找匹配的用户
,并将时间
位于用户
的时间间隔[开始
,结束
]之间的行子集(如果有)
在这个例子中,期望的结果是
user time start end
5: A 2015-09-15 2015-09-14 2015-10-20
6: A 2015-09-23 2015-09-14 2015-10-20
9: A 2015-10-01 2015-09-14 2015-10-20
10: A 2015-09-05 2015-09-14 2015-10-20
4: B 2015-10-27 2015-09-01 2015-10-30
保留行号只是为了说明,时间顺序无关紧要。您可以尝试:
setkey(dm,user)
dm[dn][time>start & time<end]
# user time start end
#1: A 2015-09-25 2015-09-14 2015-10-20
#2: A 2015-09-15 2015-09-14 2015-10-20
#3: A 2015-09-23 2015-09-14 2015-10-20
#4: A 2015-10-01 2015-09-14 2015-10-20
#5: B 2015-10-27 2015-09-01 2015-10-30
setkey(dm,用户)
dm[dn][time>开始和时间