R carret软件包中的警告查找关联”;行和列的组合在截止线“上方”;不获取和返回值
我目前正在尝试根据变量的相关性筛选变量 在我的Mac电脑上用RStudio中R的carret软件包 到目前为止,我可以计算和打印数据集的相关性。但是,一旦应用findCorrelation方法,就不会返回任何数据。我只收到以下警告: 行和列的组合在截止线上方,值=标记列R carret软件包中的警告查找关联”;行和列的组合在截止线“上方”;不获取和返回值,r,correlation,pattern-recognition,r-caret,R,Correlation,Pattern Recognition,R Caret,我目前正在尝试根据变量的相关性筛选变量 在我的Mac电脑上用RStudio中R的carret软件包 到目前为止,我可以计算和打印数据集的相关性。但是,一旦应用findCorrelation方法,就不会返回任何数据。我只收到以下警告: 行和列的组合在截止线上方,值=标记列 库(插入符号) 预处理属性类我认为问题在于你的相关矩阵: > class(na.omit(descrCor)) [1] "matrix" > dim(na.omit(descrCor)) [1] 0 153 这
库(插入符号)
预处理属性类我认为问题在于你的相关矩阵:
> class(na.omit(descrCor))
[1] "matrix"
> dim(na.omit(descrCor))
[1] 0 153
这些数据包含许多缺失数据的列:
> pct_na <- unlist(lapply(data.train, function(x) mean(is.na(x))))
> summary(pct_na)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.9793 0.6401 0.9793 0.9793
现在执行过滤器:
> highlyCorDescr <- findCorrelation(descrCor, cutoff = .9, verbose=TRUE,names=FALSE)
Compare row 10 and column 1 with corr 0.992
Means: 0.266 vs 0.164 so flagging column 10
Compare row 1 and column 9 with corr 0.925
Means: 0.247 vs 0.161 so flagging column 1
Compare row 9 and column 4 with corr 0.928
Means: 0.229 vs 0.158 so flagging column 9
Compare row 8 and column 2 with corr 0.966
Means: 0.24 vs 0.154 so flagging column 8
Compare row 19 and column 18 with corr 0.918
Means: 0.089 vs 0.155 so flagging column 18
Compare row 46 and column 31 with corr 0.914
Means: 0.099 vs 0.158 so flagging column 31
Compare row 46 and column 33 with corr 0.933
Means: 0.081 vs 0.161 so flagging column 33
All correlations <= 0.9
> keep_these <- names(data.train)[!(names(data.train) %in% colnames(descrCor)[highlyCorDescr])]
> data.train.subset <- data.train[, keep_these]
>highlyCorDescr您是否尝试过降低截止值?看起来没有一个组合的相关性>=0.9。是的,即使是0.1,我也试过了,但没有什么区别。我还可以在descrCor中看到数据,但是有一些NA没有被NA.omit删除。这可能是原因吗?你能发布你的数据的最小样本吗?如果na.omit不起作用,我会认为他们没有意识到你的一些na实际上缺失了,但如果没有可复制的例子,就无法判断。完全没有链接到你的数据,但看起来插入符号的创建者回答了你的问题,所以我认为你应该很好!
> sum(pct_na > .1)
[1] 100
> keepers <- data.train[,names(which(pct_na <= .1))]
> descrCor <- cor(keepers ,use="complete.obs")
> summary(descrCor[upper.tri(descrCor)])
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.992000 -0.108800 0.001911 0.001667 0.088680 0.980900
> highlyCorDescr <- findCorrelation(descrCor, cutoff = .9, verbose=TRUE,names=FALSE)
Compare row 10 and column 1 with corr 0.992
Means: 0.266 vs 0.164 so flagging column 10
Compare row 1 and column 9 with corr 0.925
Means: 0.247 vs 0.161 so flagging column 1
Compare row 9 and column 4 with corr 0.928
Means: 0.229 vs 0.158 so flagging column 9
Compare row 8 and column 2 with corr 0.966
Means: 0.24 vs 0.154 so flagging column 8
Compare row 19 and column 18 with corr 0.918
Means: 0.089 vs 0.155 so flagging column 18
Compare row 46 and column 31 with corr 0.914
Means: 0.099 vs 0.158 so flagging column 31
Compare row 46 and column 33 with corr 0.933
Means: 0.081 vs 0.161 so flagging column 33
All correlations <= 0.9
> keep_these <- names(data.train)[!(names(data.train) %in% colnames(descrCor)[highlyCorDescr])]
> data.train.subset <- data.train[, keep_these]