R数据表按行条件和_R_Sum_Data.table_Conditional_Row

R数据表按行条件和

R数据表按行条件和,r,sum,data.table,conditional,row,R,Sum,Data.table,Conditional,Row,我有列“colA”“colB”“group”：在每个“group”中，我想自下而上汇总“colB”，直到“colA”为“E”。基于预期的“want”，我们通过检查“colA”中的值是否为“E”来创建一个运行长度id列“grp”，然后在按“grp”和“group”分组后，创建“want1”作为“colB”的累积和，获取在“colA”中重复且也是“E”的元素的行索引“i1”，并将“colB”值分配给“want1” > tempDT <- data.table(colA = c("E","

我有列“colA”“colB”“group”：在每个“group”中，我想自下而上汇总“colB”，直到“colA”为“E”。

基于预期的“want”，我们通过检查“colA”中的值是否为“E”来创建一个运行长度id列“grp”，然后在按“grp”和“group”分组后，创建“want1”作为“colB”的累积和，获取在“colA”中重复且也是“E”的元素的行索引“i1”，并将“colB”值分配给“want1”

> tempDT <- data.table(colA = c("E","E","A","C","E","C","E","C","E"), colB = c(20,30,40,30,30,40,30,20,10), group = c(1,1,1,1,2,2,2,2,2), want = c(NA, 30, 40, 70,NA,40,70,20,30))
> tempDT
   colA colB group want
1:    E   20     1   NA
2:    E   30     1   30
3:    A   40     1   40
4:    C   30     1   70
5:    E   30     2   NA
6:    C   40     2   40
7:    E   30     2   70
8:    C   20     2   20
9:    E   10     2   30

希望这有帮助

图书馆弹琴 df%>% 组\按组%>% mutaterow_num=n:1%>% 组\按组%>% mutatesum_colB=sumcolB[row_num=行数[whichcolA=='E']，0,1，%%>% mutatesum\u colB=ifelseflag==1&row\u num==1，sum\u colB，ifelseflag==0，NA，colB%>% 选择标志，-行数%>% 数据帧输出为：

样本数据：

有一种方法：行引用+和

df <- structure(list(colA = structure(c(3L, 3L, 1L, 2L, 3L, 2L), .Label = c("A", 
"C", "E"), class = "factor"), colB = c(20, 30, 40, 30, 30, 30
), group = c(1, 1, 1, 1, 2, 2), want = c(NA, 30, 40, 70, NA, 
30)), .Names = c("colA", "colB", "group", "want"), row.names = c(NA, 
-6L), class = "data.frame")

您的情况不清楚，基于“需要”，如果在其他字符后每组有更多的“E”，请输入您的数据-@akrun示例数据已更改。期待您的解决方法。谢谢@Prem。有没有办法使您的输出与上面的数据集“tempDT”的结构完全相同？请参考更新的答案。顺便说一句，我不确定第二行的逻辑想要列有30个，但第五行有NA。此外，我不清楚您更新的样本数据中的逻辑，您希望如何在输出中设置want列样本数据集已更改。

  colA colB group want sum_colB
1    E   20     1   NA       NA
2    E   30     1   30       NA
3    A   40     1   40       40
4    C   30     1   70       70
5    E   30     2   NA       NA
6    C   30     2   30       30

df <- structure(list(colA = structure(c(3L, 3L, 1L, 2L, 3L, 2L), .Label = c("A", 
"C", "E"), class = "factor"), colB = c(20, 30, 40, 30, 30, 30
), group = c(1, 1, 1, 1, 2, 2), want = c(NA, 30, 40, 70, NA, 
30)), .Names = c("colA", "colB", "group", "want"), row.names = c(NA, 
-6L), class = "data.frame")

# input data
tempDT <- data.table(colA = c("E","E","A","C","E","C","E","C","E"), colB = c(20,30,40,30,30,40,30,20,10), group = c(1,1,1,1,2,2,2,2,2), want = c(NA, 30, 40, 70,NA,40,70,20,30))
tempDT

# find row reference previous row where colA is "E"
lastEpos <- function(i) tail(which(tempDT$colA[1:(i-1)] == "E"), 1)
tempDT[, rowRef := sapply(.I, lastEpos), by = "group"]

# sum up
sumEpos <- function(i) {
  valTEMP <- tempDT$rowRef[i]
  outputTEMP <- sum(tempDT$colB[(valTEMP+1):i])  # sum
  return(outputTEMP)
}
tempDT[, want1 := sapply(.I, sumEpos), by = "group"]

# deal with first row in every group
tempDT[, want1 := c(NA, want1[-1]), by = "group"]

# clean output
tempDT[, rowRef := NULL]
tempDT