如何在R中执行pslda,显示列表中几个元素的错误率?
我在R中对10个数据帧(10个研究区域的数据)执行splsda模型,存储为列表(如何在R中执行pslda,显示列表中几个元素的错误率?,r,list,loops,lapply,R,List,Loops,Lapply,我在R中对10个数据帧(10个研究区域的数据)执行splsda模型,存储为列表(datalist)。所有这些数据帧都是相似的,具有相同的变量,但只是不同的值。 我使用micromics库来做这件事 这是第一个研究区域的负责人。它根据不同范围的TPI值,比较是否存在湿地(因子变量-wetl或无wetl) > head(datalist[[1]]) OID POINTID WETLAND TPI200 TPI350 TPI500 TPI700 TPI900 TPI1000 TPI2000
datalist
)。所有这些数据帧都是相似的,具有相同的变量,但只是不同的值。
我使用micromics库来做这件事
这是第一个研究区域的负责人。它根据不同范围的TPI值,比较是否存在湿地(因子变量-wetl或无wetl)
> head(datalist[[1]])
OID POINTID WETLAND TPI200 TPI350 TPI500 TPI700 TPI900 TPI1000 TPI2000 TPI3000 TPI4000 TPI5000 TPI2500
1 -1 1 no wetl 70 67 55 50 48 46 53 47 49 63 48
2 -1 2 no wetl 37 42 35 29 32 16 17 35 49 63 26
3 -1 3 no wetl 45 55 45 39 41 41 53 47 49 63 48
4 -1 4 no wetl 46 58 51 43 46 36 54 47 49 62 49
5 -1 5 no wetl 58 55 53 49 47 46 54 47 49 62 49
6 -1 6 no wetl 56 53 51 49 46 46 54 47 49 61 49
我使用以下代码完成了交叉验证步骤:
library(mixOmics)
for (i in 1: length(model_list))
{
myperf_plsda <- perf(model_list[[i]], validation = "Mfold", folds = 10,
progressBar = FALSE, nrepeat = 10, auc = TRUE)
save(myperf_plsda, file="performancePLSDA.RData")
}
因此,首先,我试图绘制prinipal组件函数中的错误(=plot,上面一个研究区域的第一个代码)。结果会像我想要的pdf格式一样
第二,我想知道总体错误率和每门课的错误率,上面提到的一个研究领域的代码就是从中得出的。一个研究区域的结果如下所示,例如:
- 总错误率:
- 每类错误率:
for
循环中,或使用lappy
,以获得10个研究区域的这些结果
,例如:
### To see how many PCs is best ###
pdf('overallerrorrate_wetlall_small.pdf')
for (i in 1:length(myperf_plsda))
{
plot(model_list[[i]], col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal")
}
dev.off()
或
或
或
但所有这些代码都不起作用!如何为列表中的多个元素运行代码?非常感谢 根据您的输出,您必须创建一个新列表并将结果保存在其中。使用myperf_plsda可以覆盖循环中的每个步骤。另外,您需要的大多数度量值都是列表,因此我添加了一些处理函数来访问数据帧。我使用了下一个虚拟数据:
library(mixOmics)
#Function
custom_splsda <- function(datalist, ncomp, keepX, ..., Xcols, Ycol){
Y <- datalist[[Ycol]]
X <- datalist[Xcols]
res <- splsda(X, Y, ncomp = ncomp, keepX = keepX, ...)
res
}
#Data
datalist <- list(df1 = structure(list(OID = c(-1, -1, -1, -1, -1, -1), POINTID = c(1,
2, 3, 4, 5, 6), WETLAND = c("no wetl", "no wetl", "no wetl",
"wetl", "wetl", "wetl"), TPI200 = c(70, 37, 45, 46, 58, 56),
TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55, 35, 45,
51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46),
TPI2000 = c(53, 17, 53, 54, 54, 54), TPI3000 = c(47, 35,
47, 47, 47, 47), TPI4000 = c(49, 49, 49, 49, 49, 49), TPI5000 = c(63,
63, 63, 62, 62, 61), TPI2500 = c(48, 26, 48, 49, 49, 49)), row.names = c(NA,
6L), class = "data.frame"), df2 = structure(list(OID = c(-1,
-1, -1, -1, -1, -1), POINTID = c(1, 2, 3, 4, 5, 6), WETLAND = c("no wetl",
"no wetl", "no wetl", "wetl", "wetl", "wetl"), TPI200 = c(70,
37, 45, 46, 58, 56), TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55,
35, 45, 51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46), TPI2000 = c(53,
17, 53, 54, 54, 54), TPI3000 = c(47, 35, 47, 47, 47, 47), TPI4000 = c(49,
49, 49, 49, 49, 49), TPI5000 = c(63, 63, 63, 62, 62, 61), TPI2500 = c(48,
26, 48, 49, 49, 49)), row.names = c(NA, 6L), class = "data.frame"))
在前面的代码中,您将得到三个数据帧,其中包含根据型号列表
的名称识别的值,您可以通过varsid1
、id2
和genid
导航以查看度量值、组件和数据集:
error.rate.df
max.dist centroids.dist mahalanobis.dist id1 id2 genid
1 0.2222222 0.2222222 0.2222222 overall comp1 df1
2 0.2777778 0.3888889 0.2777778 overall comp2 df1
3 0.2222222 0.2222222 0.2222222 BER comp1 df1
4 0.2777778 0.3888889 0.2777778 BER comp2 df1
5 0.2222222 0.2222222 0.2222222 overall comp1 df2
6 0.2777778 0.3333333 0.2777778 overall comp2 df2
7 0.2222222 0.2222222 0.2222222 BER comp1 df2
8 0.2777778 0.3333333 0.2777778 BER comp2 df2
error.rate.class.df
comp1 comp2 id1 id2 genid
1 0.3333333 0.3333333 max_dist no wetl df1
2 0.1111111 0.2222222 max_dist wetl df1
3 0.3333333 0.6666667 centroids_dist no wetl df1
4 0.1111111 0.1111111 centroids_dist wetl df1
5 0.3333333 0.3333333 mahalanobis_dist no wetl df1
6 0.1111111 0.2222222 mahalanobis_dist wetl df1
7 0.3333333 0.3333333 max_dist no wetl df2
8 0.1111111 0.2222222 max_dist wetl df2
9 0.3333333 0.5555556 centroids_dist no wetl df2
10 0.1111111 0.1111111 centroids_dist wetl df2
11 0.3333333 0.3333333 mahalanobis_dist no wetl df2
12 0.1111111 0.2222222 mahalanobis_dist wetl df2
auc.df
x id1 id2 genid
1 0.62966667 AUC_mean comp1 df1
2 0.06414361 AUC_sd comp1 df1
3 0.81483333 AUC_mean comp2 df1
4 0.06414361 AUC_sd comp2 df1
5 0.62966667 AUC_mean comp1 df2
6 0.06414361 AUC_sd comp1 df2
7 0.77780000 AUC_mean comp2 df2
8 0.11110000 AUC_sd comp2 df2
最后,对于绘图,您可以使用下一个代码(我已将数据集的名称指定给x标签,以便您可以在绘图中识别它):
#打印并保存
#指定名称
姓名(myperf_plsda)你能说说什么不起作用吗?在我看来,你似乎在试图策划。你认为结果会是什么样子?@RomanLuštrik,我在帖子中添加了关于预期结果的额外信息。现在更清楚了吗?嗨,鸭子,这有帮助!这已经是你第三次帮我了!非常感谢你,英雄!!
for (i in 1:length(myperf_plsda))
{myperf_plsda[[1]]error.rate
myperf_plsda[[1]]error.rate.class
myperf_plsda[[i]]auc
}
lapply(myperf_plsda, [[, 'error.rate')`
library(mixOmics)
#Function
custom_splsda <- function(datalist, ncomp, keepX, ..., Xcols, Ycol){
Y <- datalist[[Ycol]]
X <- datalist[Xcols]
res <- splsda(X, Y, ncomp = ncomp, keepX = keepX, ...)
res
}
#Data
datalist <- list(df1 = structure(list(OID = c(-1, -1, -1, -1, -1, -1), POINTID = c(1,
2, 3, 4, 5, 6), WETLAND = c("no wetl", "no wetl", "no wetl",
"wetl", "wetl", "wetl"), TPI200 = c(70, 37, 45, 46, 58, 56),
TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55, 35, 45,
51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46),
TPI2000 = c(53, 17, 53, 54, 54, 54), TPI3000 = c(47, 35,
47, 47, 47, 47), TPI4000 = c(49, 49, 49, 49, 49, 49), TPI5000 = c(63,
63, 63, 62, 62, 61), TPI2500 = c(48, 26, 48, 49, 49, 49)), row.names = c(NA,
6L), class = "data.frame"), df2 = structure(list(OID = c(-1,
-1, -1, -1, -1, -1), POINTID = c(1, 2, 3, 4, 5, 6), WETLAND = c("no wetl",
"no wetl", "no wetl", "wetl", "wetl", "wetl"), TPI200 = c(70,
37, 45, 46, 58, 56), TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55,
35, 45, 51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46), TPI2000 = c(53,
17, 53, 54, 54, 54), TPI3000 = c(47, 35, 47, 47, 47, 47), TPI4000 = c(49,
49, 49, 49, 49, 49), TPI5000 = c(63, 63, 63, 62, 62, 61), TPI2500 = c(48,
26, 48, 49, 49, 49)), row.names = c(NA, 6L), class = "data.frame"))
#Create model_list, you must have the object created
model_list <- lapply(datalist, custom_splsda,
ncomp = 2, keepX = c(5, 5),
Xcols = 4:8, Ycol = "WETLAND")
#Create empty list
myperf_plsda <- list()
#Loop for objects and saving
for (i in 1: length(model_list))
{
myperf_plsda[[i]] <- perf(model_list[[i]], validation = "Mfold", folds = 3,
progressBar = FALSE, nrepeat = 3, auc = TRUE)
object <- myperf_plsda[[i]]
save(object,file = paste0("performancePLSDA.",i,".RData"))
}
#Process the object myperf_plsda
#First function to get elements
extract1 <- function(x)
{
#Object
error.rate <- x$error.rate
error.rate <- lapply(error.rate, as.data.frame)
#Process
O1 <- do.call(rbind,error.rate)
#Separate vars
O1$id <- rownames(O1)
rownames(O1) <- NULL
O1$id1 <- gsub("\\..*","", O1$id )
O1$id2 <- gsub(".*\\.","", O1$id )
O1$id <- NULL
return(O1)
}
#Function 2
extract2 <- function(x)
{
#Object
error.rate.class <- x$error.rate.class
names(error.rate.class) <- gsub('.','_',names(error.rate.class),fixed = T)
error.rate.class <- lapply(error.rate.class, as.data.frame)
#Process
O2 <- do.call(rbind,error.rate.class)
#Separate vars
O2$id <- rownames(O2)
rownames(O2) <- NULL
O2$id1 <- gsub("\\..*","", O2$id )
O2$id2 <- gsub(".*\\.","", O2$id )
O2$id <- NULL
return(O2)
}
#Function 3
extract3 <- function(x)
{
#Object
auc <- x$auc
#Modify for dataframe
change <- function(x)
{
y <- as.data.frame(x)
y$id1 <- rownames(y)
rownames(y)<-NULL
y$id1 <- gsub('.','_',y$id1,fixed = T)
return(y)
}
auc <- lapply(auc, change)
#Process
O3 <- do.call(rbind,auc)
#Separate vars
O3$id2 <- rownames(O3)
rownames(O3) <- NULL
O3$id2 <- gsub("\\..*","", O3$id2 )
return(O3)
}
#Apply functions and save in lists for late process
L1 <- lapply(myperf_plsda,extract1)
L2 <- lapply(myperf_plsda,extract2)
L3 <- lapply(myperf_plsda,extract3)
#Assign the same names from model_list
names(L1) <- names(model_list)
names(L2) <- names(model_list)
names(L3) <- names(model_list)
#Bind the data
#Error rate
error.rate.df <- do.call(rbind,L1)
error.rate.df$genid <- gsub("\\..*","", rownames(error.rate.df) )
rownames(error.rate.df) <- NULL
#Error rate class
error.rate.class.df <- do.call(rbind,L2)
error.rate.class.df$genid <- gsub("\\..*","", rownames(error.rate.class.df) )
rownames(error.rate.class.df) <- NULL
#Auc
auc.df <- do.call(rbind,L3)
auc.df$genid <- gsub("\\..*","", rownames(auc.df) )
rownames(auc.df) <- NULL
error.rate.df
max.dist centroids.dist mahalanobis.dist id1 id2 genid
1 0.2222222 0.2222222 0.2222222 overall comp1 df1
2 0.2777778 0.3888889 0.2777778 overall comp2 df1
3 0.2222222 0.2222222 0.2222222 BER comp1 df1
4 0.2777778 0.3888889 0.2777778 BER comp2 df1
5 0.2222222 0.2222222 0.2222222 overall comp1 df2
6 0.2777778 0.3333333 0.2777778 overall comp2 df2
7 0.2222222 0.2222222 0.2222222 BER comp1 df2
8 0.2777778 0.3333333 0.2777778 BER comp2 df2
error.rate.class.df
comp1 comp2 id1 id2 genid
1 0.3333333 0.3333333 max_dist no wetl df1
2 0.1111111 0.2222222 max_dist wetl df1
3 0.3333333 0.6666667 centroids_dist no wetl df1
4 0.1111111 0.1111111 centroids_dist wetl df1
5 0.3333333 0.3333333 mahalanobis_dist no wetl df1
6 0.1111111 0.2222222 mahalanobis_dist wetl df1
7 0.3333333 0.3333333 max_dist no wetl df2
8 0.1111111 0.2222222 max_dist wetl df2
9 0.3333333 0.5555556 centroids_dist no wetl df2
10 0.1111111 0.1111111 centroids_dist wetl df2
11 0.3333333 0.3333333 mahalanobis_dist no wetl df2
12 0.1111111 0.2222222 mahalanobis_dist wetl df2
auc.df
x id1 id2 genid
1 0.62966667 AUC_mean comp1 df1
2 0.06414361 AUC_sd comp1 df1
3 0.81483333 AUC_mean comp2 df1
4 0.06414361 AUC_sd comp2 df1
5 0.62966667 AUC_mean comp1 df2
6 0.06414361 AUC_sd comp1 df2
7 0.77780000 AUC_mean comp2 df2
8 0.11110000 AUC_sd comp2 df2
#Plot and save
#Assign names
names(myperf_plsda) <- names(model_list)
pdf('example.pdf')
for (i in 1:length(myperf_plsda))
{
plot(myperf_plsda[[i]], col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal",xlab = paste0(names(myperf_plsda)[i],' (Comp)'))
}
dev.off()