使用R中的data.table查找上一周的最后日期
我有一个data.frame,如下所示:使用R中的data.table查找上一周的最后日期,r,data.table,R,Data.table,我有一个data.frame,如下所示: structure(list(Start_Date = c("D1", "D2", "D3", "D4", "NA", "NA", "D6", "D7"), Week = c("W1", "W1", "W1", "W2", "W2", "W3", "W4", "W4"), last_date = c(NA, NA, NA, "D3", "D3", "D4", "D4", "D4" )), class = "data.frame", row.names
structure(list(Start_Date = c("D1", "D2", "D3", "D4", "NA", "NA",
"D6", "D7"), Week = c("W1", "W1", "W1", "W2", "W2", "W3", "W4",
"W4"), last_date = c(NA, NA, NA, "D3", "D3", "D4", "D4", "D4"
)), class = "data.frame", row.names = c(NA, -8L))
输出为“最后日期”列
我需要什么-我想找到前一周的最后一个非NA日期。如果前一周只有NA日期,则应查看前一周的前一周并查找非NA日期。
例如,对于W2的所有日期,最后一个日期将是D3(前一周的最后一个非NA日期)。对于W3,它应该返回D4。
对于W4,由于W3的最后一个日期是NA,它应该查找前一周(即W2)中的非NA日期并返回D4
总之,最后一个日期将是最近的非NA日期(不是来自本周)
由于我的数据集太大,我正在寻找data.table解决方案。这里有一个选项(假设数据已经排序):
输出:
Start_Date Week last_date week_nr last_dat2
1: D1 W1 <NA> 1 <NA>
2: D2 W1 <NA> 1 <NA>
3: D3 W1 <NA> 1 <NA>
4: D4 W2 D3 2 D3
5: <NA> W2 D3 2 D3
6: <NA> W3 D4 3 D4
7: D6 W4 D4 4 D4
8: D7 W4 D4 4 D4
开始日期周最后日期周最后日期2
1:D1 W1 1
2:D2 W1 1
3:D3 W1 1
4:D4 W2 D3 2 D3
5:W2 D3 2 D3
6:W3 D4 3 D4
7:D6 W4 D4 D4
8:D7 W4 D4 D4
此处,要连接的查找表以不同的方式创建:
library(data.table)
library(magrittr) # piping used to improve readability
lut <- DT[, .(Week, fifelse(Start_Date == "NA", NA_character_, Start_Date) %>% zoo::na.locf())][
, last(V2), by = Week][
, V1 := shift(V1)][]
DT[lut, on = .(Week), last_date2 := V1][]
第1周
1:W1
2:W2 D3
3:W3-D4
4:W4 D4
创建人
- 用以前的值替换缺少的
值(LOCF=上一次观察结转)Start\u Date
- 这需要事先将字符串“NA”替换为“NA”,而不是
NA\u字符
- 按
进行聚合周
- 最后,将数值移动(滞后)一周
请注意,查找表不包含任何
NA
值(当然,第一行除外),并且W2
周的最后一个有效D4
开始日期已结转至周W3
和W4
这是一个基本的R解决方案,其中使用了ave()
和split()
:
df$last_date <- df$last_date <- with(df, ave(na.omit(Start_Date)[cumsum(!is.na(Start_Date))],Week, FUN = function(x) tail(x[!is.na(x)],1)))
dfout <- Reduce(rbind,
lapply(seq(dfs<-split(df,df$Week)),
function(k) {
dfs[[k]]$last_date <- ifelse(k==1, NA, unique(dfs[[k-1]]$last_date));
dfs[[k]]}))
df$last\u date另一个数据。表
选项是使用roll=
和mult=
setDT(DT)[, c("Week", "W") := .(rl <- rleid(Week), rl - 0.1)][,
last_dat := df[Start_Date!="NA"][
.SD, on=.(Week=W), roll=Inf, mult="last", x.Start_Date]
]
setDT(DT)[,c(“Week”,“W”):=(rl Hi Sindri..感谢您的解决方案。但是,我发现您的解决方案中有一个遗漏。根据您的解决方案,它只是将指针移到前一周。如果前一周只有一个NA日期,那么它应该查看前一周并找出非NA日期。简言之,最新的非NA日期(不是从本周开始)我忘记了解决方案中的mult=
。
library(data.table)
library(magrittr) # piping used to improve readability
lut <- DT[, .(Week, fifelse(Start_Date == "NA", NA_character_, Start_Date) %>% zoo::na.locf())][
, last(V2), by = Week][
, V1 := shift(V1)][]
DT[lut, on = .(Week), last_date2 := V1][]
Start_Date Week last_date last_date2
1: D1 W1 <NA> <NA>
2: D2 W1 <NA> <NA>
3: D3 W1 <NA> <NA>
4: D4 W2 D3 D3
5: NA W2 D3 D3
6: NA W3 D4 D4
7: D6 W4 D4 D4
8: D7 W4 D4 D4
lut
Week V1
1: W1 <NA>
2: W2 D3
3: W3 D4
4: W4 D4
df$last_date <- df$last_date <- with(df, ave(na.omit(Start_Date)[cumsum(!is.na(Start_Date))],Week, FUN = function(x) tail(x[!is.na(x)],1)))
dfout <- Reduce(rbind,
lapply(seq(dfs<-split(df,df$Week)),
function(k) {
dfs[[k]]$last_date <- ifelse(k==1, NA, unique(dfs[[k-1]]$last_date));
dfs[[k]]}))
Start_Date Week last_date
1 D1 W1 <NA>
2 D2 W1 <NA>
3 D3 W1 <NA>
4 D4 W2 D3
5 <NA> W2 D3
6 <NA> W3 D4
7 D6 W4 D4
8 D7 W4 D4
setDT(DT)[, c("Week", "W") := .(rl <- rleid(Week), rl - 0.1)][,
last_dat := df[Start_Date!="NA"][
.SD, on=.(Week=W), roll=Inf, mult="last", x.Start_Date]
]
Start_Date Week last_date W last_dat
1: D1 1 <NA> 0.9 <NA>
2: D2 1 <NA> 0.9 <NA>
3: D3 1 <NA> 0.9 <NA>
4: D4 2 D3 1.9 D3
5: NA 2 D3 1.9 D3
6: NA 3 D4 2.9 D4
7: D6 4 D4 3.9 D4
8: D7 4 D4 3.9 D4
library(data.table)
DT <- structure(list(Start_Date = c("D1", "D2", "D3", "D4", "NA", "NA",
"D6", "D7"), Week = c("W1", "W1", "W1", "W2", "W2", "W3", "W4",
"W4"), last_date = c(NA, NA, NA, "D3", "D3", "D4", "D4", "D4"
)), class = "data.frame", row.names = c(NA, -8L))