Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 基于相同id的上一行和下一行计算空行的所有数值变量的值_R_Loops_Time Series_Interpolation_Apply - Fatal编程技术网

R 基于相同id的上一行和下一行计算空行的所有数值变量的值

R 基于相同id的上一行和下一行计算空行的所有数值变量的值,r,loops,time-series,interpolation,apply,R,Loops,Time Series,Interpolation,Apply,我有一个非常大的data.frame(数千个变量),每行有一个标识符和一年。一行可能出现数年或出现在数据的中间。缺少一年(1997年),我想以某种方式插值所有数值变量的值: 复制上一年(1996年)中存在标识符的所有行 次年(1998年) 对于所有数值变量,计算前一年变量与下一年变量的平均值-对应的特定两行(具有相同标识符) 由于这是一个非常大的数据集,我迫切希望避免使用循环 示例数据: test_df <- data.frame(id = c(1,2,3,1,3), year = c(9

我有一个非常大的data.frame(数千个变量),每行有一个标识符和一年。一行可能出现数年或出现在数据的中间。缺少一年(1997年),我想以某种方式插值所有数值变量的值:

  • 复制上一年(1996年)中存在标识符的所有行 次年(1998年)
  • 对于所有数值变量,计算前一年变量与下一年变量的平均值-对应的特定两行(具有相同标识符)
  • 由于这是一个非常大的数据集,我迫切希望避免使用循环
  • 示例数据:

    test_df <- data.frame(id = c(1,2,3,1,3), year = c(96,96,96,98,98), 
                          state = c("MA","MD","NY","MA", "NY"),
                          num1 = c(10,11,22,9,27), num2 = c(11566,32340,97555,14200,100025))
    > test_df
      id year state num1   num2
    1  1   96    MA   10  11566
    2  2   96    MD   11  32340
    3  3   96    NY   22  97555
    4  1   98    MA    9  14200
    5  3   98    NY   27 100025
    
    到目前为止,我所做的是对上一年中与下一年id相同的行进行子设置,并选择数值变量。在计算之后,我将把它们绑定到主数据中

    common_ids <- test_df[test_df$year==1996,]
    common_ids <- common_ids[test_df[test_df$year==1996,]$id %in% test_df[test_df$year==1998,]$id,]
    numeric_vars <- sapply(common_ids,is.numeric)
    
    common_ids[,numeric_vars] <- lapply(common_ids[,numeric_vars], function(x)???)
    

    common\u id使用
    data.table
    zoo
    您可以从以下内容开始

    library(data.table)
    library(zoo)
    
    test_df <- data.table(id = c(1,2,3,1,3), year = c(96,96,96,98,98), 
                      state = c("MA","MD","NY","MA", "NY"),
                      num1 = c(10,11,22,9,27), num2 = c(11566,32340,97555,14200,100025))
    
    test_df <- test_df[order(id, year)]
    
    missing.ids <- test_df[, c(NA, id[-.N]), by = id][!is.na(V1),V1]
    
    temp_df <- data.table(id = missing.ids, year = rep(97, length(missing.ids)), state = NA, num1 = NA, num2 = NA)
    
    new.test_df <- rbind(test_df, temp_df)[order(id, year)]
    
    new.test_df[, state := na.locf(state, na.rm = FALSE), by = id]
    new.test_df[, `:=` (num1 = na.approx(num1, na.rm = FALSE), num2 = na.approx(num2, na.rm = FALSE)), by = id]
    
    库(data.table)
    图书馆(动物园)
    
    test_df您可以在不命名特定变量(有数千个)的情况下执行此操作吗?您可以创建一个列名称向量,并使用
    数据在其中循环。table
    set
    。今天没有时间,但也许有人可以更新我的答案
    library(data.table)
    library(zoo)
    
    test_df <- data.table(id = c(1,2,3,1,3), year = c(96,96,96,98,98), 
                      state = c("MA","MD","NY","MA", "NY"),
                      num1 = c(10,11,22,9,27), num2 = c(11566,32340,97555,14200,100025))
    
    test_df <- test_df[order(id, year)]
    
    missing.ids <- test_df[, c(NA, id[-.N]), by = id][!is.na(V1),V1]
    
    temp_df <- data.table(id = missing.ids, year = rep(97, length(missing.ids)), state = NA, num1 = NA, num2 = NA)
    
    new.test_df <- rbind(test_df, temp_df)[order(id, year)]
    
    new.test_df[, state := na.locf(state, na.rm = FALSE), by = id]
    new.test_df[, `:=` (num1 = na.approx(num1, na.rm = FALSE), num2 = na.approx(num2, na.rm = FALSE)), by = id]
    
    library(data.table)
    library(zoo)
    
    test_df <- data.table(id = c(1,2,3,1,3), year = c(96,96,96,98,98), 
                      state = c("MA","MD","NY","MA", "NY"),
                      num1 = c(10,11,22,9,27), num2 = c(11566,32340,97555,14200,100025))
    
    test_df <- test_df[order(id, year)]
    
    mynum.cols <- names(test_df)[!(names(test_df) %in% c("id", "year", "state"))]
    missing.ids <- test_df[, c(NA, id[-.N]), by = id][!is.na(V1),V1]
    
    temp_df <- data.table(id = missing.ids, year = rep(97, length(missing.ids)), state = NA, 
                      data.table(matrix(NA, nrow = length(missing.ids), ncol = length(mynum.cols), 
                                        dimnames = list(rep(NA, length(missing.ids)), mynum.cols))))
    
    new.test_df <- rbind(test_df, temp_df)[order(id, year)]
    
    new.test_df[, state := na.locf(state, na.rm = FALSE), by = id]
    
    new.test_df[, (mynum.cols) := lapply(.SD, function(x) na.approx(x, na.rm = FALSE)), by = id, .SDcols = mynum.cols]
    
    new.test_df <- new.test_df[order(year, id)]
    new.test_df