Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 多次有效地对数据表进行子集设置_R_Data.table_Subset_Date Range - Fatal编程技术网

R 多次有效地对数据表进行子集设置

R 多次有效地对数据表进行子集设置,r,data.table,subset,date-range,R,Data.table,Subset,Date Range,我有这种格式的数据 > data = data.table(id = 1:10, date = seq(as.Date("2016-01-01"), by = 1, length = 10)) > data id date 1: 1 2016-01-01 2: 2 2016-01-02 3: 3 2016-01-03 4: 4 2016-01-04 5: 5 2016-01-05 6: 6 2016-01-06 7: 7 2016-01

我有这种格式的数据

> data = data.table(id = 1:10, date = seq(as.Date("2016-01-01"), by = 1, length = 10))
> data
    id       date
 1:  1 2016-01-01
 2:  2 2016-01-02
 3:  3 2016-01-03
 4:  4 2016-01-04
 5:  5 2016-01-05
 6:  6 2016-01-06
 7:  7 2016-01-07
 8:  8 2016-01-08
 9:  9 2016-01-09
10: 10 2016-01-10
我还有另一个矩阵,它是我希望执行的查询/子集

> query = data.table(id = c(1,4,7), date_start = c("2016-01-01", "2016-01-01", "2016-01-01"), date_end = c("2016-01-04", "2016-01-02", "2016-01-03"))
> query
   id date_start   date_end
1:  1 2016-01-01 2016-01-04
2:  4 2016-01-01 2016-01-02
3:  7 2016-01-01 2016-01-03
我想这样做:

subset(data, (id == query[1] & date > date_start[1] & date < date_end[1]) | 
       (id == query[2] & date > date_start[2] & date < date_end[2]) |
       (id == query[3] & date > date_start[3] & date < date_end[3]))
subset(数据,(id==query[1]&日期>日期开始[1]&日期<日期结束[1])|
(id==查询[2]&日期>日期开始[2]&日期<日期结束[2])|
(id==查询[3]&日期>开始日期[3]&日期<结束日期[3]))
是否有自动生成子集查询的方法,而无需使用for循环和rbinding结果

谢谢

首先,你可以加入他们:

data.full <- merge(data,query,by="id", all.x=T)
或者,如果要保留在
查询中未引用的记录,并保留在日期范围内引用的记录:

data.final <- data.full[is.na(date_start) | (date >= date_start & date <= date_end),]
data.final
   id       date date_start   date_end
1:  1 2016-01-01 2016-01-01 2016-01-04
2:  2 2016-01-02         NA         NA
3:  3 2016-01-03         NA         NA
4:  5 2016-01-05         NA         NA
5:  6 2016-01-06         NA         NA
6:  8 2016-01-08         NA         NA
7:  9 2016-01-09         NA         NA
8: 10 2016-01-10         NA         NA

data.final=date\u start&date如果我们稍微转换OP的数据以获得

library(data.table)
data = setDT(structure(list(id = 1:10, date = structure(16801:16810, class = c("IDate", 
"Date")), date2 = structure(16801:16810, class = c("IDate", "Date"
))), .Names = c("id", "date", "date2"), row.names = c(NA, -10L
), class = c("data.table", "data.frame"), sorted = c("id", 
"date", "date2")))

query = setDT(structure(list(id = c(1, 4, 7), date_start = 
structure(c(16801L, 
16801L, 16801L), class = c("IDate", "Date")), date_end = structure(c(16804L, 
16802L, 16803L), class = c("IDate", "Date"))), .Names = c("id", 
"date_start", "date_end"), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"), sorted = c("id", 
"date_start", "date_end")))
。。。然后我们可以像这样使用
foverlaps

foverlaps(data, query, nomatch=0)
#    id date_start   date_end       date      date2
# 1:  1 2016-01-01 2016-01-04 2016-01-01 2016-01-01
对于这种方法,我认为在合并之前需要采取以下步骤:

  • 将所有日期设置为
    IDate
    s
  • 在主数据中创建一个额外的日期列
  • 在每个表上设置键
在中,您可以按如下方式直接执行
非等效连接:

# data.table v1.9.7+
data[query, .(id, x.date), on=.(id, date>=date_start, date<=date_end)]
#data.table v1.9.7+

数据[查询,(id,x.date),日期=。(id,date>=日期\开始,dateInside
[.data.table
你不需要
数据。full$
@docendodiscimus哦,这是真的。我写这篇文章的时候甚至没有想过它是一个data.table。幸运的是它仍然是这样工作的,但你的评论是正确的。我继续更新了它。你可以使用
foverlaps(数据,查询,nomatch=0)
正确格式化数据后(使用
IDate
列,请参见
?foverlaps
?IDate
)。此问答可能会有所帮助:我认为foverlaps是我正在寻找的,如果你作为答案回答,我将能够接受。欢迎任何更正。我实际上不太使用
foverlaps
。请注意,有没有办法进行查询联接并删除对角线(与自身完全匹配?)
foverlaps(data, query, nomatch=0)
#    id date_start   date_end       date      date2
# 1:  1 2016-01-01 2016-01-04 2016-01-01 2016-01-01
# data.table v1.9.7+
data[query, .(id, x.date), on=.(id, date>=date_start, date<=date_end)]