使用R中的特定行限制计算故障率
我有一个这样的数据帧使用R中的特定行限制计算故障率,r,dataframe,data.table,dplyr,reshape2,R,Dataframe,Data.table,Dplyr,Reshape2,我有一个这样的数据帧 ID <- c("ID300","ID301","ID302","ID303","ID304","ID305","ID306","ID307","ID308","ID309") Measurement <- c("Length","Length","Length","Length","Length","Length","Length","Length","Length","Length") PASSFAIL <- c("FAIL","PASS","FAIL
ID <- c("ID300","ID301","ID302","ID303","ID304","ID305","ID306","ID307","ID308","ID309")
Measurement <- c("Length","Length","Length","Length","Length","Length","Length","Length","Length","Length")
PASSFAIL <- c("FAIL","PASS","FAIL","FAIL#Pts","PASS","PASS","PASS","PASS","PASS","FAIL")
df1 <- data.frame(ID,Measurement,PASSFAIL)
ID Measurement PASSFAIL FR
1 ID300 Length FAIL 0.6
2 ID301 Length PASS 0.4
3 ID302 Length FAIL 0.4
4 ID303 Length FAIL#Pts 0.2
5 ID304 Length PASS 0.0
6 ID305 Length PASS 0.2
7 ID306 Length PASS NA
8 ID307 Length PASS NA
9 ID308 Length PASS NA
10 ID309 Length FAIL NA
第2部分
完成后,我需要重新计算添加的每个新ID的失败率,并考虑相同的窗口5。例如,我希望的输出是
ID Measurement PASSFAIL FR
1 ID296 Length PASS 0.4
2 ID297 Length FAIL 0.6
3 ID298 Length PASS 0.6
4 ID299 Length FAIL 0.6
5 ID300 Length FAIL 0.8
6 ID301 Length FAIL 0.6
7 ID302 Length PASS NA
8 ID303 Length FAIL NA
9 ID304 Length FAIL#Pts NA
10 ID305 Length PASS NA
我目前正在通过这样做来计算故障率,这会计算整个数据帧的故障率。考虑到窗口大小为5,我不知道如何使用循环按顺序计算每个ID
setDT(df1)
# aggregate
df1 <- df1[, .( FR = (sum(PASSFAIL != "PASS")/.N))]
setDT(df1)
#聚合
df1您可能想尝试sapply函数,出于良好的顺序考虑,我还要声明df1没有因子
df1 <- data.frame(ID,Measurement,PASSFAIL,stringsAsFactors = FALSE)
df1$FR <- sapply(df1$ID,FUN = function(x) {
if(which(df1$ID == x) > nrow(df1)-4){
return(NA_real_)
}else{
start_ID <- which(df1$ID == x)
end_ID <- start_ID + 4
return(sum(grepl("FAIL",df1[start_ID:end_ID,"PASSFAIL"]))/5)
}
})
df1我对您的第2部分感到迷茫,但下面是使用stats::filter
和grepl
调用对第1部分进行排序,以搜索所有包含的值“FAIL”
:
如果你想变得有趣。我建议你看看zoo
软件包中的filter
或rollapply
。例如-filter(grepl(“FAIL”,df1$PASSFAIL),rep(1,5)/5,sides=1)
还要注意,有一个by=
参数可以传递给data.table
在by=
变量定义的组内运行函数。这很好,但它忽略了“FAIL”并且仅当passfail列中的值为pass或fail?时有效?。你能修改它来考虑“失败”吗?也可以是失败的吗?你怎么能这么懒,伙计:-,只需替换我的= =到GRPPLALATEMAIL,谢谢这个解决方案,但是我得到一个错误,说“错误在UMeMeod(“Fielter”):没有适用的方法“过滤器”应用到一个类“逻辑”的对象“我如何摆脱它?”是因为dplyr吗?我刚刚重新启动了R会话,并在没有使用dplyr的情况下再次运行了它,它工作得很好。很棒的解决方案。非常感谢你。但我以后可能会在代码中使用dplyr?这会成为一个问题吗?我的第二部分是“移动故障率”。因为这实际上是一个时间序列数据,所以我希望这个解决方案能够为添加到数据帧中的每个新的传入数据点进行计算。但我认为你的解决方案可以做到这一点。我问这个问题可能真的很愚蠢,但我会测试它并让你知道。@Sharath-如果你加载了dplyr
,你可以显式调用stats::filter()
,而不仅仅是filter
。
df1 <- data.frame(ID,Measurement,PASSFAIL,stringsAsFactors = FALSE)
df1$FR <- sapply(df1$ID,FUN = function(x) {
if(which(df1$ID == x) > nrow(df1)-4){
return(NA_real_)
}else{
start_ID <- which(df1$ID == x)
end_ID <- start_ID + 4
return(sum(grepl("FAIL",df1[start_ID:end_ID,"PASSFAIL"]))/5)
}
})
df1$FR <- NA
vals <- na.omit(filter(grepl("FAIL",df1$PASSFAIL), rep(1,5)/5, sides=1))
df1$FR[seq(1,length(vals))] <- vals
df1
# ID Measurement PASSFAIL FR
#1 ID300 Length FAIL 0.6
#2 ID301 Length PASS 0.4
#3 ID302 Length FAIL 0.4
#4 ID303 Length FAIL#Pts 0.2
#5 ID304 Length PASS 0.0
#6 ID305 Length PASS 0.2
#7 ID306 Length PASS NA
#8 ID307 Length PASS NA
#9 ID308 Length PASS NA
#10 ID309 Length FAIL NA
rev(filter(grepl("FAIL",rev(df1$PASSFAIL)), rep(1,5)/5, sides=1))