Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用Data.table的笛卡尔滚动联接_R_Join_Data.table - Fatal编程技术网

R 使用Data.table的笛卡尔滚动联接

R 使用Data.table的笛卡尔滚动联接,r,join,data.table,R,Join,Data.table,我有两张桌子: dat:包含数据 日期:包含日期表 我追求的结果如下。即,对每个单独的dat行进行滚动连接,然后合并结果 rbind( dat[1,][dates, roll = 90], dat[2,][dates, roll = 90], dat[3,][dates, roll = 90], ... dat[12,][dates, roll = 90] ) 我的实际数据集要大得多,所以列出每一行dat是不切实际的。有没有一种不用循环就能做同样事情的简便方法?这不一定是最好的方法,但您

我有两张桌子:

  • dat:包含数据

  • 日期:包含日期表


我追求的结果如下。即,对每个单独的dat行进行滚动连接,然后合并结果

rbind(
dat[1,][dates, roll = 90],
dat[2,][dates, roll = 90],
dat[3,][dates, roll = 90],
...
dat[12,][dates, roll = 90]
)

我的实际数据集要大得多,所以列出每一行dat是不切实际的。有没有一种不用循环就能做同样事情的简便方法?

这不一定是最好的方法,但您可以在这里简单地编写一个循环来迭代数据:

df <- data.frame()

for (i in 1:nrow(dat)){
    df <- rbind(df, dat[i,][dates, roll = 90])
}

head(df)

          date country state item value
  1: 2018-01-31     CCC    S6   M2     6
  2: 2018-02-28     CCC    S6   M2     6
  3: 2018-03-31     CCC    S6   M2     6
  4: 2018-04-30    <NA>  <NA> <NA>    NA
  5: 2018-05-31    <NA>  <NA> <NA>    NA

df如果我正确理解您的意图,您希望将记录展期90天。
我使用了交叉连接,然后使用滚动标准来创建子集

您的原始表格:

library(data.table)

dates = structure(list(date = structure(c(17562, 17590, 17621, 17651, 
                                          17682, 17712, 17743, 17774, 17804, 17835, 17865, 17896), class = "Date")), 
                  row.names = c(NA, -12L), class = "data.frame")


dat = structure(list(date = structure(c(17546, 17743, 17778, 17901, 
                                        17536, 17806, 17901, 17981, 17532, 17722, 17969, 18234), class = "Date"), 
                     country = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
                                           3L, 3L, 3L), .Label = c("AAA", "BBB", "CCC"), class = "factor"), 
                     state = structure(c(1L, 1L, 2L, 3L, 4L, 1L, 2L, 5L, 6L, 1L, 
                                         2L, 2L), .Label = c("S1", "S2", "S3", "S4", "S5", "S6"), class = "factor"), 
                     item = structure(c(1L, 2L, 4L, 6L, 3L, 5L, 3L, 2L, 2L, 4L, 
                                        5L, 7L), .Label = c("M1", "M2", "M3", "M4", "M5", "M6", "M7"
                                        ), class = "factor"), value = c(67L, 10L, 50L, 52L, 93L, 
                                                                        50L, 62L, 46L, 6L, 30L, 30L, 14L)), row.names = c(NA, -12L
                                                                        ), class = "data.frame")


dates = data.table(dates)
dat = data.table(dat)
注意,我还没有设置密钥

我正在使用引用中的交叉连接函数:


CJ.table.1通过rbind或类似方式在R中动态增长对象通常效率很低(我想你知道)。我不认为循环一定很糟糕,尽管我只是初始化df以获得正确的行数和正确的列集合,比如
iris[rep(NA_integer,10),]
,然后分配给每一行而不是增加它。同样的想法,我认为:
dat[dates[,(k=seq(1L,nrow(dat)),by=date],roll=90,allow.cartesian=TRUE][order(k)][,k:=NULL][]
这太整洁了!@Frank这正是我想要的。谢谢SatZ和Frank。
library(data.table)

dates = structure(list(date = structure(c(17562, 17590, 17621, 17651, 
                                          17682, 17712, 17743, 17774, 17804, 17835, 17865, 17896), class = "Date")), 
                  row.names = c(NA, -12L), class = "data.frame")


dat = structure(list(date = structure(c(17546, 17743, 17778, 17901, 
                                        17536, 17806, 17901, 17981, 17532, 17722, 17969, 18234), class = "Date"), 
                     country = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
                                           3L, 3L, 3L), .Label = c("AAA", "BBB", "CCC"), class = "factor"), 
                     state = structure(c(1L, 1L, 2L, 3L, 4L, 1L, 2L, 5L, 6L, 1L, 
                                         2L, 2L), .Label = c("S1", "S2", "S3", "S4", "S5", "S6"), class = "factor"), 
                     item = structure(c(1L, 2L, 4L, 6L, 3L, 5L, 3L, 2L, 2L, 4L, 
                                        5L, 7L), .Label = c("M1", "M2", "M3", "M4", "M5", "M6", "M7"
                                        ), class = "factor"), value = c(67L, 10L, 50L, 52L, 93L, 
                                                                        50L, 62L, 46L, 6L, 30L, 30L, 14L)), row.names = c(NA, -12L
                                                                        ), class = "data.frame")


dates = data.table(dates)
dat = data.table(dat)
CJ.table.1 <- function(X,Y)
  setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]
dsn1<-CJ.table.1(dat,dates)[i.date-date<=90 & i.date-date>=0][,.(date=i.date,country, state, item, value)][order(country, state, item, value,date),]