Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用R中的data.table查找上一周的最后日期_R_Data.table - Fatal编程技术网

使用R中的data.table查找上一周的最后日期

使用R中的data.table查找上一周的最后日期,r,data.table,R,Data.table,我有一个data.frame,如下所示: structure(list(Start_Date = c("D1", "D2", "D3", "D4", "NA", "NA", "D6", "D7"), Week = c("W1", "W1", "W1", "W2", "W2", "W3", "W4", "W4"), last_date = c(NA, NA, NA, "D3", "D3", "D4", "D4", "D4" )), class = "data.frame", row.names

我有一个data.frame,如下所示:

structure(list(Start_Date = c("D1", "D2", "D3", "D4", "NA", "NA", 
"D6", "D7"), Week = c("W1", "W1", "W1", "W2", "W2", "W3", "W4", 
"W4"), last_date = c(NA, NA, NA, "D3", "D3", "D4", "D4", "D4"
)), class = "data.frame", row.names = c(NA, -8L))
输出为“最后日期”列

我需要什么-我想找到前一周的最后一个非NA日期。如果前一周只有NA日期,则应查看前一周的前一周并查找非NA日期。 例如,对于W2的所有日期,最后一个日期将是D3(前一周的最后一个非NA日期)。对于W3,它应该返回D4。 对于W4,由于W3的最后一个日期是NA,它应该查找前一周(即W2)中的非NA日期并返回D4

总之,最后一个日期将是最近的非NA日期(不是来自本周)

由于我的数据集太大,我正在寻找data.table解决方案。

这里有一个选项(假设数据已经排序):

输出:

   Start_Date Week last_date week_nr last_dat2
1:         D1   W1      <NA>       1      <NA>
2:         D2   W1      <NA>       1      <NA>
3:         D3   W1      <NA>       1      <NA>
4:         D4   W2        D3       2        D3
5:       <NA>   W2        D3       2        D3
6:       <NA>   W3        D4       3        D4
7:         D6   W4        D4       4        D4
8:         D7   W4        D4       4        D4
开始日期周最后日期周最后日期2
1:D1 W1 1
2:D2 W1 1
3:D3 W1 1
4:D4 W2 D3 2 D3
5:W2 D3 2 D3
6:W3 D4 3 D4
7:D6 W4 D4 D4
8:D7 W4 D4 D4

此处,要连接的查找表以不同的方式创建:

library(data.table)
library(magrittr) # piping used to improve readability
lut <- DT[, .(Week, fifelse(Start_Date == "NA", NA_character_, Start_Date) %>% zoo::na.locf())][
  , last(V2), by = Week][
    , V1 := shift(V1)][]
DT[lut, on = .(Week),  last_date2 := V1][]
第1周
1:W1
2:W2 D3
3:W3-D4
4:W4 D4
创建人

  • 用以前的值替换缺少的
    Start\u Date
    值(LOCF=上一次观察结转)
  • 这需要事先将字符串“NA”替换为“NA”,而不是
    NA\u字符
  • 进行聚合
  • 最后,将数值移动(滞后)一周

请注意,查找表不包含任何
NA
值(当然,第一行除外),并且
W2
周的最后一个有效
D4
开始日期已结转至周
W3
W4

这是一个基本的R解决方案,其中使用了
ave()
split()

df$last_date <- df$last_date <- with(df, ave(na.omit(Start_Date)[cumsum(!is.na(Start_Date))],Week, FUN = function(x) tail(x[!is.na(x)],1)))
dfout <- Reduce(rbind,
                lapply(seq(dfs<-split(df,df$Week)), 
                       function(k) {
                         dfs[[k]]$last_date <- ifelse(k==1, NA, unique(dfs[[k-1]]$last_date)); 
                         dfs[[k]]}))

df$last\u date另一个
数据。表
选项是使用
roll=
mult=

setDT(DT)[, c("Week", "W") := .(rl <- rleid(Week), rl - 0.1)][, 
    last_dat := df[Start_Date!="NA"][
        .SD, on=.(Week=W), roll=Inf, mult="last", x.Start_Date]
    ]

setDT(DT)[,c(“Week”,“W”):=(rl Hi Sindri..感谢您的解决方案。但是,我发现您的解决方案中有一个遗漏。根据您的解决方案,它只是将指针移到前一周。如果前一周只有一个NA日期,那么它应该查看前一周并找出非NA日期。简言之,最新的非NA日期(不是从本周开始)我忘记了解决方案中的
mult=
library(data.table)
library(magrittr) # piping used to improve readability
lut <- DT[, .(Week, fifelse(Start_Date == "NA", NA_character_, Start_Date) %>% zoo::na.locf())][
  , last(V2), by = Week][
    , V1 := shift(V1)][]
DT[lut, on = .(Week),  last_date2 := V1][]
   Start_Date Week last_date last_date2
1:         D1   W1      <NA>       <NA>
2:         D2   W1      <NA>       <NA>
3:         D3   W1      <NA>       <NA>
4:         D4   W2        D3         D3
5:         NA   W2        D3         D3
6:         NA   W3        D4         D4
7:         D6   W4        D4         D4
8:         D7   W4        D4         D4
lut
   Week   V1
1:   W1 <NA>
2:   W2   D3
3:   W3   D4
4:   W4   D4
df$last_date <- df$last_date <- with(df, ave(na.omit(Start_Date)[cumsum(!is.na(Start_Date))],Week, FUN = function(x) tail(x[!is.na(x)],1)))
dfout <- Reduce(rbind,
                lapply(seq(dfs<-split(df,df$Week)), 
                       function(k) {
                         dfs[[k]]$last_date <- ifelse(k==1, NA, unique(dfs[[k-1]]$last_date)); 
                         dfs[[k]]}))
  Start_Date Week last_date
1         D1   W1      <NA>
2         D2   W1      <NA>
3         D3   W1      <NA>
4         D4   W2        D3
5       <NA>   W2        D3
6       <NA>   W3        D4
7         D6   W4        D4
8         D7   W4        D4
setDT(DT)[, c("Week", "W") := .(rl <- rleid(Week), rl - 0.1)][, 
    last_dat := df[Start_Date!="NA"][
        .SD, on=.(Week=W), roll=Inf, mult="last", x.Start_Date]
    ]
   Start_Date Week last_date   W last_dat
1:         D1    1      <NA> 0.9     <NA>
2:         D2    1      <NA> 0.9     <NA>
3:         D3    1      <NA> 0.9     <NA>
4:         D4    2        D3 1.9       D3
5:         NA    2        D3 1.9       D3
6:         NA    3        D4 2.9       D4
7:         D6    4        D4 3.9       D4
8:         D7    4        D4 3.9       D4
library(data.table)
DT <- structure(list(Start_Date = c("D1", "D2", "D3", "D4", "NA", "NA", 
    "D6", "D7"), Week = c("W1", "W1", "W1", "W2", "W2", "W3", "W4", 
        "W4"), last_date = c(NA, NA, NA, "D3", "D3", "D4", "D4", "D4"
        )), class = "data.frame", row.names = c(NA, -8L))