考虑到与其他变量不同的条件,计算R中列表中分配的数据帧中变量的总和
大家好,我正在处理R中的数据帧列表。R中的列表非常棒,但我想解决这个问题。我有一个名为考虑到与其他变量不同的条件,计算R中列表中分配的数据帧中变量的总和,r,plyr,lapply,R,Plyr,Lapply,大家好,我正在处理R中的数据帧列表。R中的列表非常棒,但我想解决这个问题。我有一个名为global的列表,它有五个数据帧f1、f2、f3、f4、f5每个数据帧都有一个名为CreditValue的主变量,变量的作用类似于标志,例如f1有CreditValue和一个值为1的标志变量b1f2有两个标志变量b1的值为1和b2的值为2f3有三个标志变量b1值为1、b2值为2和b3值为3f4有四个标志变量b1的值为1,b2的值为2,b3的值为3,b4的值为4f5有五个标志变量b1值为1,b2值为2,b3值为
global
的列表,它有五个数据帧f1、f2、f3、f4、f5
每个数据帧都有一个名为CreditValue
的主变量,变量的作用类似于标志,例如f1
有CreditValue
和一个值为1的标志变量b1
f2
有两个标志变量b1
的值为1和b2
的值为2f3
有三个标志变量b1
值为1、b2
值为2和b3
值为3f4
有四个标志变量b1
的值为1,b2
的值为2,b3
的值为3,b4
的值为4f5
有五个标志变量b1
值为1,b2
值为2,b3
值为3,b4
值为4,b5
值为5。对于所有数据帧,标志变量始终从第3列开始。考虑到标志变量的不同方面,我希望计算每个数据帧中CreditValue
的总和。我的列表具有下一个结构(我在最后一部分包括dput
version):
我使用了llply()
函数表单plyr
包来处理R中的列表,但我不知道如何定义函数来实现这一点。我使用这段代码计算总和,但是如果我有更多的数据帧,它会非常复杂。另外,考虑到标志变量(5),我想将这些值保存在新的数据帧或矩阵中。总和的结果如下所示:
sum(f1$CreditValue[f1[,3]==1])
[1] 45
sum(f2$CreditValue[f2[,3]==1],na.rm=TRUE)
[1] 36
sum(f3$CreditValue[f3[,3]==1],na.rm=TRUE)
[1] 36
sum(f4$CreditValue[f4[,3]==1],na.rm=TRUE)
[1] 97
sum(f5$CreditValue[f5[,3]==1],na.rm=TRUE)
[1] 97
structure(list(f1 = structure(list(KeyID = c("001", "002", "003",
"004", "005", "006", "007", "009", "010"), CreditValue = c(1,
2, 3, 4, 5, 6, 7, 8, 9), b1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("KeyID",
"CreditValue", "b1"), row.names = c(NA, 9L), class = "data.frame"),
f2 = structure(list(KeyID = c("001", "002", "003", "004",
"005", "006", "007", "009", "010", "011", "012"), CreditValue = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11), b1 = c(1, 1, NA, NA, NA,
1, 1, NA, 1, NA, 1), b2 = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2)), .Names = c("KeyID", "CreditValue", "b1", "b2"), row.names = c(NA,
11L), class = "data.frame"), f3 = structure(list(KeyID = c("001",
"002", "003", "004", "005", "006", "007", "009", "010", "011",
"012", "013", "014"), CreditValue = c(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 11, 11), b1 = c(1, 1, 1, 1, NA, NA, 1, 1, NA,
NA, NA, 1, NA), b2 = c(2, 2, 2, 2, 2, 2, 2, 2, NA, NA, 2,
2, NA), b3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("KeyID",
"CreditValue", "b1", "b2", "b3"), row.names = c(NA, 13L), class = "data.frame"),
f4 = structure(list(KeyID = c("001", "002", "003", "004",
"005", "006", "007", "009", "010", "011", "012", "013", "014",
"015", "016"), CreditValue = c(1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 11, 11, 12, 12), b1 = c(NA, NA, NA, NA, NA, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1), b2 = c(2, 2, NA, NA, NA, 2, 2, 2,
2, 2, 2, 2, 2, NA, NA), b3 = c(3, 3, NA, NA, NA, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3), b4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4)), .Names = c("KeyID", "CreditValue", "b1",
"b2", "b3", "b4"), row.names = c(NA, 15L), class = "data.frame"),
f5 = structure(list(KeyID = c("001", "002", "003", "004",
"005", "006", "007", "009", "010", "011", "012", "013", "014",
"015", "016", "017", "018"), CreditValue = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 11, 11, 12, 12, 14, 14), b1 = c(1,
1, 1, 1, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA), b2 = c(2,
2, 2, 2, NA, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, NA, NA), b3 = c(3,
3, 3, 3, 3, 3, 3, 3, 3, NA, 3, 3, 3, 3, 3, NA, NA), b4 = c(4,
4, 4, 4, 4, 4, 4, 4, 4, NA, 4, 4, 4, 4, 4, 4, 4), b5 = c(5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5)), .Names = c("KeyID",
"CreditValue", "b1", "b2", "b3", "b4", "b5"), row.names = c(NA,
17L), class = "data.frame")), .Names = c("f1", "f2", "f3",
"f4", "f5"))
在所有数据帧中,这些总和是通过考虑b1
变量的公式计算的
sum(f2$CreditValue[is.na(f2[,3]) & f2[,4]==2] ,na.rm=TRUE)
[1] 30
sum(f3$CreditValue[is.na(f3[,3]) & f3[,4]==2] ,na.rm=TRUE)
[1] 22
sum(f4$CreditValue[is.na(f4[,3]) & f4[,4]==2] ,na.rm=TRUE)
[1] 3
sum(f5$CreditValue[is.na(f5[,3]) & f5[,4]==2] ,na.rm=TRUE)
[1] 0
考虑到所有数据帧中b2
和b1
变量的值,应用这些公式计算这些总和。这里有一个超过b1
(第3列)值的条件
在所有数据帧中,这些总和是通过考虑b3
、b2
和b1
变量值的公式计算的。现在有一个条件超过b1
和b2
(第3列和第4列)的值
应用这些公式计算这些总和时,考虑了所有数据帧中b4
、b3
、b2
和b1
变量的值。现在有一个条件覆盖b1
、b2
和b3
(第3、4、5列)的值
该总和的计算采用上一个公式,考虑所有数据帧中b5
、b4
、b3
、b2
和b1
变量的值。现在有一个条件覆盖b1
、b2
、b3
和b4
(第3、4、5、6列)的值
显示的总和是许多代码的结果,但我想创建一个函数,该函数在标记变量(b1、b2、b3、b4、b5
)上运行,以计算总和。我不知道是否可以使用for
或使用llply
或lappy
的函数来实现这一点。我尝试恢复如下代码:
sum(f5$CreditValue[is.na(f5[,3]) & is.na(f5[,4]) & is.na(f5[,5]) & is.na(f5[,6]) & f5[,7]==5] ,na.rm=TRUE)
f1 f2 f3 f4 f5
f1 45 0 0 0 0
f2 36 30 0 0 0
f3 36 22 30 0 0
f4 97 3 0 12 0
f5 97 0 5 28 10
使用此代码:
sum(f5$CreditValue[is.na(f5[,3,4,5,6]) & f5[,7]==5] ,na.rm=TRUE)
但它不起作用,因为在最初的条件下,我只考虑每个数据帧中的特定行,而恢复的代码不会这样做。我想将求和结果保存在一个新的数据框中,矩阵如下:
sum(f5$CreditValue[is.na(f5[,3]) & is.na(f5[,4]) & is.na(f5[,5]) & is.na(f5[,6]) & f5[,7]==5] ,na.rm=TRUE)
f1 f2 f3 f4 f5
f1 45 0 0 0 0
f2 36 30 0 0 0
f3 36 22 30 0 0
f4 97 3 0 12 0
f5 97 0 5 28 10
最后一个数据帧中的零是由于所有数据帧没有所有的标志变量而产生的,例如f1
只有b1
,它没有b2、b3、b4、b5
像f5
。我的列表的dput
版本是下一个:
sum(f1$CreditValue[f1[,3]==1])
[1] 45
sum(f2$CreditValue[f2[,3]==1],na.rm=TRUE)
[1] 36
sum(f3$CreditValue[f3[,3]==1],na.rm=TRUE)
[1] 36
sum(f4$CreditValue[f4[,3]==1],na.rm=TRUE)
[1] 97
sum(f5$CreditValue[f5[,3]==1],na.rm=TRUE)
[1] 97
structure(list(f1 = structure(list(KeyID = c("001", "002", "003",
"004", "005", "006", "007", "009", "010"), CreditValue = c(1,
2, 3, 4, 5, 6, 7, 8, 9), b1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("KeyID",
"CreditValue", "b1"), row.names = c(NA, 9L), class = "data.frame"),
f2 = structure(list(KeyID = c("001", "002", "003", "004",
"005", "006", "007", "009", "010", "011", "012"), CreditValue = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11), b1 = c(1, 1, NA, NA, NA,
1, 1, NA, 1, NA, 1), b2 = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2)), .Names = c("KeyID", "CreditValue", "b1", "b2"), row.names = c(NA,
11L), class = "data.frame"), f3 = structure(list(KeyID = c("001",
"002", "003", "004", "005", "006", "007", "009", "010", "011",
"012", "013", "014"), CreditValue = c(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 11, 11), b1 = c(1, 1, 1, 1, NA, NA, 1, 1, NA,
NA, NA, 1, NA), b2 = c(2, 2, 2, 2, 2, 2, 2, 2, NA, NA, 2,
2, NA), b3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("KeyID",
"CreditValue", "b1", "b2", "b3"), row.names = c(NA, 13L), class = "data.frame"),
f4 = structure(list(KeyID = c("001", "002", "003", "004",
"005", "006", "007", "009", "010", "011", "012", "013", "014",
"015", "016"), CreditValue = c(1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 11, 11, 12, 12), b1 = c(NA, NA, NA, NA, NA, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1), b2 = c(2, 2, NA, NA, NA, 2, 2, 2,
2, 2, 2, 2, 2, NA, NA), b3 = c(3, 3, NA, NA, NA, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3), b4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4)), .Names = c("KeyID", "CreditValue", "b1",
"b2", "b3", "b4"), row.names = c(NA, 15L), class = "data.frame"),
f5 = structure(list(KeyID = c("001", "002", "003", "004",
"005", "006", "007", "009", "010", "011", "012", "013", "014",
"015", "016", "017", "018"), CreditValue = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 11, 11, 12, 12, 14, 14), b1 = c(1,
1, 1, 1, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA), b2 = c(2,
2, 2, 2, NA, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, NA, NA), b3 = c(3,
3, 3, 3, 3, 3, 3, 3, 3, NA, 3, 3, 3, 3, 3, NA, NA), b4 = c(4,
4, 4, 4, 4, 4, 4, 4, 4, NA, 4, 4, 4, 4, 4, 4, 4), b5 = c(5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5)), .Names = c("KeyID",
"CreditValue", "b1", "b2", "b3", "b4", "b5"), row.names = c(NA,
17L), class = "data.frame")), .Names = c("f1", "f2", "f3",
"f4", "f5"))
我希望你能帮助我,对我来说,构建一个函数来计算和是如此复杂,如果我使用传统形式的代码,我会遇到更多数据帧列表的问题。感谢您的帮助。您可以使用Lappy并调用一个函数来构建输出数据框的行:
get.sums = function(df) {
sapply(1:5, function(y) {
if (y > 1) {
na.col = 3:(y+1)
} else {
na.col = NULL
}
if (paste0("b", y) %in% names(df)) {
return(sum(df$CreditValue[rowSums(!is.na(df[,na.col,drop=F])) == 0 & df[,(y+2)] == y], na.rm=T))
} else {
return(0)
}
})
}
rows = lapply(global, get.sums)
sums = do.call(rbind, rows)
sums
# [,1] [,2] [,3] [,4] [,5]
# f1 45 0 0 0 0
# f2 36 30 0 0 0
# f3 36 22 30 0 0
# f4 97 3 0 12 0
# f5 97 0 5 28 10