R 基于其他列值求和行数

R 基于其他列值求和行数,r,sum,row,conditional,R,Sum,Row,Conditional,我是一个新的R用户,正在寻找有人告诉我应该使用什么功能来实现以下目标的正确方向 我有下面的数据框。使用dput命令进行输出 structure(list(ID = 4701:4702, Date.1 = structure(c(5L, 5L), .Label = c("01/02/2013", "01/03/2013", "01/05/2013", "02/05/2013", "04/02/2013", "04/03/2013", "05/02/2013", "05/03/2013", "0

我是一个新的R用户,正在寻找有人告诉我应该使用什么功能来实现以下目标的正确方向

我有下面的数据框。使用dput命令进行输出

structure(list(ID = 4701:4702, Date.1 = structure(c(5L, 5L), .Label = c("01/02/2013", 
"01/03/2013", "01/05/2013", "02/05/2013", "04/02/2013", "04/03/2013", 
"05/02/2013", "05/03/2013", "06/02/2013", "06/03/2013", "07/02/2013", 
"07/03/2013", "08/02/2013", "08/07/2013", "12/12/2012", "13/12/2012", 
"14/01/2013", "14/12/2012", "15/01/2013", "16/01/2013", "17/01/2013", 
"17/12/2012", "18/01/2013", "18/04/2013", "18/12/2012", "19/04/2013", 
"23/01/2013", "24/01/2013", "25/01/2013", "26/04/2013", "28/01/2013", 
"29/01/2013", "29/04/2013", "30/04/2013", "31/01/2013"), class = "factor"), 
 Day.of.Week.1 = structure(c(2L, 2L), .Label = c("Friday", 
"Monday", "Thursday", "Tuesday", "Wednesday"), class = "factor"), 
Sedentary.1 = c(511.5, 405.5), Light.1 = c(133.666666666667, 
119.166666666667), Moderate.1 = c(12.1666666666667, 13.1666666666667
), Vigorous.1 = c(4.33333333333333, 3.5), Axis.1.Counts.1 = c(157124L, 
126177L), Axis.1.CPM.1 = c(237.5, 233.1), Time.1 = c(661.67, 
541.33), Day.of.Week.2 = structure(c(1L, 4L), .Label = c("Friday", 
"Monday", "Thursday", "Tuesday", "Wednesday"), class = "factor"), 
Sedentary.2 = c(370.166666666667, 601.833333333333), Light.2 = c(113, 
162.5), Moderate.2 = c(12, 13), Vigorous.2 = c(4, 10), Axis.1.Counts.2 = c(141593L, 
201373L), Axis.1.CPM.2 = c(283.7, 255.8), Number.of.Epochs.2 = c(2995L, 
4724L), Time.2 = c(499.17, 787.33), Day.of.Week.3 = structure(c(NA, 
5L), .Label = c("Friday", "Monday", "Thursday", "Tuesday", 
"Wednesday"), class = "factor"), Sedentary.3 = c(NA, 463), 
Light.3 = c(NA, 121.666666666667), Moderate.3 = c(NA, 14.5
), Vigorous.3 = c(NA, 11.5), Axis.1.Counts.3 = c(NA, 196192L
), Axis.1.CPM.3 = c(NA, 321.3), Number.of.Epochs.3 = c(NA, 
3664L), Time.3 = c(NA, 610.67), Day.of.Week.4 = structure(c(NA, 
3L), .Label = c("Friday", "Monday", "Thursday", "Tuesday", 
"Wednesday"), class = "factor"), Sedentary.4 = c(NA, 472.333333333333
), Light.4 = c(NA, 149.166666666667), Moderate.4 = c(NA, 
11.3333333333333), Vigorous.4 = c(NA, 14.1666666666667), 
Axis.1.Counts.4 = c(NA, 218895L), Axis.1.CPM.4 = c(NA, 338.3
), Number.of.Epochs.4 = c(NA, 3882L), Time.4 = c(NA, 647), 
Day.of.Week.5 = structure(c(NA, 1L), .Label = c("Friday", 
"Monday", "Thursday", "Tuesday", "Wednesday"), class = "factor"), 
Sedentary.5 = c(NA, 383.166666666667), Light.5 = c(NA, 106.5
), Moderate.5 = c(NA, 8), Vigorous.5 = c(NA, 0.5), Axis.1.Counts.5 = c(NA, 
92163L), Axis.1.CPM.5 = c(NA, 185), Number.of.Epochs.5 = c(NA, 
2989L), Time.5 = c(NA, 498.17)), .Names = c("ID", "Date.1", 
"Day.of.Week.1", "Sedentary.1", "Light.1", "Moderate.1", "Vigorous.1", 
"Axis.1.Counts.1", "Axis.1.CPM.1", "Time.1", "Day.of.Week.2", 
"Sedentary.2", "Light.2", "Moderate.2", "Vigorous.2", "Axis.1.Counts.2", 
"Axis.1.CPM.2", "Number.of.Epochs.2", "Time.2", "Day.of.Week.3", 
"Sedentary.3", "Light.3", "Moderate.3", "Vigorous.3", "Axis.1.Counts.3", 
"Axis.1.CPM.3", "Number.of.Epochs.3", "Time.3", "Day.of.Week.4",  
"Sedentary.4", "Light.4", "Moderate.4", "Vigorous.4", "Axis.1.Counts.4", 
"Axis.1.CPM.4", "Number.of.Epochs.4", "Time.4", "Day.of.Week.5", 
"Sedentary.5", "Light.5", "Moderate.5", "Vigorous.5", "Axis.1.Counts.5", 
"Axis.1.CPM.5", "Number.of.Epochs.5", "Time.5"), reshapeWide = structure(list(
v.names = NULL, timevar = "ID2", idvar = "ID", times = 1:5, 
varying = structure(c("Filename.1", "Epoch.1", "Weight..kg..1", 
"Age.1", "Gender.1", "Date.1", "Day.of.Week.1", "Day.of.Week.Num.1", 
"Sedentary.1", "Light.1", "Moderate.1", "Vigorous.1", "Axis.1.Counts.1", 
"Axis.1.Average.Counts.1", "Axis.1.CPM.1", "Number.of.Epochs.1", 
"Time.1", "Calendar.Days.1", "Filename.2", "Epoch.2", "Weight..kg..2", 
"Age.2", "Gender.2", "Date.2", "Day.of.Week.2", "Day.of.Week.Num.2", 
"Sedentary.2", "Light.2", "Moderate.2", "Vigorous.2", "Axis.1.Counts.2", 
"Axis.1.Average.Counts.2", "Axis.1.CPM.2", "Number.of.Epochs.2", 
"Time.2", "Calendar.Days.2", "Filename.3", "Epoch.3", "Weight..kg..3", 
"Age.3", "Gender.3", "Date.3", "Day.of.Week.3", "Day.of.Week.Num.3", 
"Sedentary.3", "Light.3", "Moderate.3", "Vigorous.3", "Axis.1.Counts.3", 
"Axis.1.Average.Counts.3", "Axis.1.CPM.3", "Number.of.Epochs.3", 
"Time.3", "Calendar.Days.3", "Filename.4", "Epoch.4", "Weight..kg..4", 
"Age.4", "Gender.4", "Date.4", "Day.of.Week.4", "Day.of.Week.Num.4", 
"Sedentary.4", "Light.4", "Moderate.4", "Vigorous.4", "Axis.1.Counts.4", 
"Axis.1.Average.Counts.4", "Axis.1.CPM.4", "Number.of.Epochs.4", 
"Time.4", "Calendar.Days.4", "Filename.5", "Epoch.5", "Weight..kg..5", 
"Age.5", "Gender.5", "Date.5", "Day.of.Week.5", "Day.of.Week.Num.5", 
"Sedentary.5", "Light.5", "Moderate.5", "Vigorous.5", "Axis.1.Counts.5", 
"Axis.1.Average.Counts.5", "Axis.1.CPM.5", "Number.of.Epochs.5", 
"Time.5", "Calendar.Days.5"), .Dim = c(18L, 5L))), .Names = c("v.names", 
"timevar", "idvar", "times", "varying")), row.names = c(1L, 3L
), class = "data.frame")
我想在
seditional.1、seditional.2、seditional.3、seditional.4和
seditional.5列中对每一行进行求和。但我希望只有当另一列满足某个标准时,才能将每一列包含在计算中

即包括以下栏目:

-久坐.1如果时间值为.1>=377
-久坐.2如果时间值为.2>=377
-久坐。3如果时间值。3>=377
-久坐。4如果时间值。4>=377
-久坐.5如果时间值为.5>=377

我可以在excel中使用SumIf函数来完成这项工作,但我不知道从R中的什么地方开始。如果你能给我指一个我能读到的函数,我将不胜感激

非常感谢,


Ash

对其他列进行索引将帮助您开始

sum(df$Sedentary.1[df$Time.1 >= 377])
plyr包是一种很好的方法,可以同时获得多个列的总和

library(plyr)

df2 <- ddply(df, .(), summarise, Sedentary.1 = sum(Sedentary.1[Time.1 >= 377], na.rm = TRUE), 
             Sedentary.2 = sum(Sedentary.2[Time.2 >= 377], na.rm = TRUE))

   .id Sedentary.1 Sedentary.2
1 <NA>         917         972
库(plyr)
df2=377],na.rm=TRUE),
久坐的.2=总和(久坐的.2[Time.2>=377],na.rm=TRUE))
.id久坐。1久坐。2
1          917         972

可能有一种更有效和/或更干净的方法,但在这里,我发现哪些时间列不是NA,并且符合您的标准,然后将久坐的时间列乘以答案,然后计算行和。TRUE将被视为1,FALSE将被视为0-因此结果是满足条件的行的总和,因为不需要的久坐值在求和之前将乘以0

x是您提供的数据帧的名称

rowSums(x[c("Sedentary.1","Sedentary.2","Sedentary.3","Sedentary.4","Sedentary.5")] * (!is.na(x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")]) & x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")] >= 377), na.rm=TRUE)
编辑评论中的问题:

像这样的方法应该会奏效:

# make TRUE/FALSE table
TF = !is.na(x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")]) & x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")] >= 377

# take rowSums of Sedentary.x when TF rowSums are greater than or equal to 3
rowSums(x[rowSums(TF) >= 3,c("Sedentary.1","Sedentary.2","Sedentary.3","Sedentary.4","Sedentary.5")] * TF[rowSums(TF) >= 3,], na.rm=TRUE)

如果需要的话,可以将其制作成一行,但我已经将其分为几个阶段,将真/假表保存为“TF”以提高可读性

我是这样做的。首先,我找出哪些时间*列的值>=377,然后将其与data.frame相乘,后者是仅久坐*列的子集。R将TRUE处理为1,将FALSE处理为0,因此存在FALSE的值将变为0。如果存在NA,则该值将保持为NA

此代码假定时间和久坐时间按相同顺序列出

sub.time <- mydf[, names(mydf)[grepl("Time", names(mydf))], ]
sumif <- sub.time >= 377
sub.sed <- mydf[, names(mydf)[grepl("Sedentary", names(mydf))], ]
apply(sub.sed * sumif, MARGIN = 1, sum, na.rm = TRUE)

        1         3 
 881.6667 2325.8333

sub.time其余的呢,1:5?这应该可以让用户开始了。剩下的应该很容易计算出来。示例数据包括NAs,因此仅此答案不适用于所有其他列。e、 g.第5栏,罗曼和平。我已经根据您之前的第一段代码对其他列进行了索引。回顾我的问题,我不认为我想做什么很清楚。我试图找出每行中这些列的总和或平均值。我可以用Ifelse和rowSums来做这个吗?非常感谢您抽出时间回复。我将相应地编辑我的问题。你想为每一行求和,对吗?这是正确的罗曼。谢谢。我正在尝试类似的东西:df=rowSums(加速度2[,c(久坐的.1[Time.1>=377],久坐的.2[Time.2>=377],久坐的.3[Time.3>=377],久坐的.4[Time.4>=377],久坐的.5[Time.5>=377],na.rm=TRUE)这非常适合Ping。我更喜欢这样的代码,因为我可以很容易地了解它如何实现我想要的功能,并将其应用于脚本的其他部分。非常感谢。刚刚记得您在对其他答案的评论中提到了查找平均值。在这里,您将无法将行和替换为行平均值,因为您将在平均值计算中包含0。如果您想保留相同的方法,可以找到行和并除以真/假表的行和。谢谢Ping。我通过将真/假表转换为数值和数据帧,然后执行此行和来实现这一点。我想知道是否有办法向您提供的代码中添加额外的条件。这不是吗o仅对那些从新的真/假数据帧返回的真/假行和大于3(可能为5)的行和执行行和-如果您遵循?您实际上不需要转换为数字,因为当您尝试在算术中使用真和假时,会自动发生行和(尽管这没有什么害处).我在上面我认为你的意思的基础上增加了一个解决方案。