R 具有先前结果的列_R_Database_Dataframe

R 具有先前结果的列

r database dataframe

R 具有先前结果的列,r,database,dataframe,R,Database,Dataframe,我和R一起工作我所拥有的： ID_1 ID_2 Date x_1 y_2 1 12 3 2011-12-21 15 10 2 12 13 2011-12-22 50 40 3 3 12 2011-12-22 20 30 4

我和R一起工作

我所拥有的：

      ID_1     ID_2      Date        x_1        y_2     
1      12       3     2011-12-21       15        10     
2      12       13    2011-12-22       50        40     
3      3        12    2011-12-22       20        30     
4      15       13    2011-12-23       30        20     
...
and so on

目标：

      ID_1     ID_2      Date        x_1        y_2     XX_1      YY_2
1      12       3     2011-12-21       15        10      0         0
2      12       13    2011-12-22       50        40      15        0
3      3        12    2011-12-22       20        30      10        50
4      15       13    2011-12-23       30        20      0         40
...
and so on

我想在

XX_1

和

YY_2

中查看

x_1

和

y_2

列中的值，如果在该日期之前没有值可用，则对应于

ID_1

和

ID1_2

中的先前值。我不知道如何处理

ID_1

和

ID_2

中可能存在不同值的事实（如示例中的id3和id12）

@埃卡捷夫 ID1和ID2（查找整个ID行的匹配项，即使切换了ID的顺序）：

如果我理解正确的话，应该在所有严格高于给定ID值的行中从左到右、从下到上查找目标ID。我将编写函数查找前面ID的坐标

# find the indices of the preceded ID value # @id_matrix == your_data_frame[, c("ID_1", "ID_2")] # [@i_of_row, @i_of_col] are the coordinates of the considered ID # i_of_row > 1 FindPreviousID <- function(id_matrix, i_of_row, i_of_col) { shorten_matrix <- id_matrix[1:(i_of_row - 1),,drop = FALSE] rev_ind <- match(table = rev(t(shorten_matrix)), x = ids[i_of_row,i_of_col], nomatch = NA_real_) n_row_found <- floor((length(shorten_matrix) - rev_ind)/2) + 1 n_col_found <- (length(shorten_matrix) - rev_ind) %% ncol(shorten_matrix) + 1 return(c(row = n_row_found, col = n_col_found)) }

#查找前面ID值的索引 #@id_matrix==您的_数据_帧[，c（“id_1”，“id_2”）] #[@i_of_row，@i_of_col]是所考虑的ID的坐标 #第i行的第i行>1 FindPreviousID如果我理解正确，目标ID应该从左到右、从下到上在所有严格高于给定ID值的行中查找。我将编写函数查找前面ID的坐标 # find the indices of the preceded ID value # @id_matrix == your_data_frame[, c("ID_1", "ID_2")] # [@i_of_row, @i_of_col] are the coordinates of the considered ID # i_of_row > 1 FindPreviousID <- function(id_matrix, i_of_row, i_of_col) { shorten_matrix <- id_matrix[1:(i_of_row - 1),,drop = FALSE] rev_ind <- match(table = rev(t(shorten_matrix)), x = ids[i_of_row,i_of_col], nomatch = NA_real_) n_row_found <- floor((length(shorten_matrix) - rev_ind)/2) + 1 n_col_found <- (length(shorten_matrix) - rev_ind) %% ncol(shorten_matrix) + 1 return(c(row = n_row_found, col = n_col_found)) } #查找前面ID值的索引 #@id_matrix==您的_数据_帧[，c（“id_1”，“id_2”）] #[@i_of_row，@i_of_col]是所考虑的ID的坐标 #第i行的第i行>1 FindPreviousIDOP已请求将ID （如果有）的先前值复制到相应的新列中这可以通过将多个列同时从宽格式重塑为长格式，通过移位/滞后找到以前的值，然后重塑为宽格式来解决： library(data.table) setDT(DF)[, rn := .I] long <- melt(DF, id.vars = c("rn", "Date"), measure.vars = patterns("^ID", "^x|y"), value.name = c("ID", "value")) long[order(Date), previous := shift(value, fill = 0), by = ID] dcast(long, rn + Date ~ variable, value.var = c("ID", "value", "previous")) 或者，对dcast（）的最后一次调用可以在加入时由更新替换：这正好再现了OP的预期结果资料库（data.table） DFOP已请求将ID （如果有）的先前值复制到相应的新列中这可以通过将多个列同时从宽格式重塑为长格式，通过移位/滞后找到以前的值，然后重塑为宽格式来解决： library(data.table) setDT(DF)[, rn := .I] long <- melt(DF, id.vars = c("rn", "Date"), measure.vars = patterns("^ID", "^x|y"), value.name = c("ID", "value")) long[order(Date), previous := shift(value, fill = 0), by = ID] dcast(long, rn + Date ~ variable, value.var = c("ID", "value", "previous")) 或者，对dcast（）的最后一次调用可以在加入时由更新替换：这正好再现了OP的预期结果资料库（data.table） DF我已经读了这篇文章至少4次，我正在努力理解你的要求。我试图找出如何生成XX_1 和YY_1 ，但没有成功。你需要更多地解释你的意思。这真是令人困惑。这可能只是我不理解的一个问题。第2行，XX_1是15，因为ID1 12的“最后”结果是15（x_1）。在YY_中，2是0，因为没有关于id 13的记录。第3行，ID1是3，之前的值3是10（y_2），因此在XX_1中，值是10。等等姓名栏中的数字指的是身份证位置。我已经读了这篇文章至少4次，我很难理解你的要求。我试图找出如何生成XX_1 和YY_1 ，但没有成功。你需要更多地解释你的意思。这真是令人困惑。这可能只是我不理解的一个问题。第2行，XX_1是15，因为ID1 12的“最后”结果是15（x_1）。在YY_中，2是0，因为没有关于id 13的记录。第3行，ID1是3，之前的值3是10（y_2），因此在XX_1中，值是10。等等“姓名”列中的数字指的是ID位置。本周我将试一试，thanx in advanceit工作起来很有魅力！thanx很多，现在我知道怎么做这样的任务了。我从不考虑编写函数脚本+1我试图破解你的代码，但有些代码对我来说毫无意义。我希望你能解决我对它工作原理的疑问。1）找不到行和列。为什么你使用地板和除以2，+1？我不明白你在做什么。2）假设我想将相同的脚本用于另一个目的：仅当id1和id2（同一行）中的元素出现在上面时才执行相同的操作，当然，如果相同的元素出现在id1和id2列中，则执行相同的操作。感谢您的反馈！1）计算n\u row\u found ，n\u col\u found 的目的是将转换后的shorten\u矩阵中找到的元素的“线性”索引rev\u ind 转换为原始shorten\u矩阵中的一对索引。诀窍是使用不同的R子集方法。威克曼的《高级R》第3节对我掌握这些问题非常有帮助。讨论了在“线性”索引和一对索引之间转换的公式。2）我不太明白您想要更改什么。它是lD_1 /lD_2 和x_1 /y_2 之间的对应关系吗？你能举一个例子来说明吗？我这周会试试，thanx in Advance它就像一个符咒！thanx很多，现在我知道怎么做这样的任务了。我从不考虑编写函数脚本+1我试图破解你的代码，但有些代码对我来说毫无意义。我希望你能解决我对它工作原理的疑问。1）找不到行和列。为什么你使用地板和除以2，+1？我不明白你在做什么。2）假设我想将相同的脚本用于另一个目的：仅当id1和id2（同一行）中的元素出现在上面时才执行相同的操作，当然，如果相同的元素出现在id1和id2列中，则执行相同的操作。感谢您的反馈！1）计算n\u row\u found ，n\u col\u found 的目的是将转换后的shorten\u矩阵中找到的元素的“线性”索引rev\u ind 转换为原始shorten\u矩阵中的一对索引。诀窍是使用不同的R子集方法。Wi第3节 indices_of_vars <- sapply(FUN = function(i) FindPreviousIDsPair(id_matrix = ids, i), X = seq(along.with = ids[, 1])[-1]) indices_XX <- indices_of_vars[1:2, ] indices_YY <- indices_of_vars[c(1, 3), ] XX_column <- c(NA, vars[t(indices_XX)]) XX_column[is.na(XX_column)] <- 0 YY_column <- c(NA, vars[t(indices_YY)]) YY_column[is.na(YY_column)] <- 0 library(data.table) setDT(DF)[, rn := .I] long <- melt(DF, id.vars = c("rn", "Date"), measure.vars = patterns("^ID", "^x|y"), value.name = c("ID", "value")) long[order(Date), previous := shift(value, fill = 0), by = ID] dcast(long, rn + Date ~ variable, value.var = c("ID", "value", "previous")) rn Date ID_1 ID_2 value_1 value_2 previous_1 previous_2 1: 1 2011-12-21 12 3 15 10 0 0 2: 2 2011-12-22 12 13 50 40 15 0 3: 3 2011-12-22 3 12 20 30 10 50 4: 4 2011-12-23 15 13 30 20 0 40 DF[long, on = .(rn), c("XX_1", "YY_2") := .(previous[variable == 1L], previous[variable == 2L])][ , rn := NULL] DF ID_1 ID_2 Date x_1 y_2 XX_1 YY_2 1: 12 3 2011-12-21 15 10 0 0 2: 12 13 2011-12-22 50 40 15 0 3: 3 12 2011-12-22 20 30 10 50 4: 15 13 2011-12-23 30 20 0 40 library(data.table) DF <- fread( "i ID_1 ID_2 Date x_1 y_2 1 12 3 2011-12-21 15 10 2 12 13 2011-12-22 50 40 3 3 12 2011-12-22 20 30 4 15 13 2011-12-23 30 20 ", drop = 1L )