列名不同于用于选择最佳变量的fit regsubset
我试图从regsubset中获取重要的变量(列名)。我喜欢一个接一个地得到我可以分析的重要变量。这是节目单列名不同于用于选择最佳变量的fit regsubset,r,machine-learning,R,Machine Learning,我试图从regsubset中获取重要的变量(列名)。我喜欢一个接一个地得到我可以分析的重要变量。这是节目单 library(leaps) library(ISLR) data(Hitters) reg_fit=regsubsets(Salary~., data = Hitters, nvmax = 10, method = "forward") 问题是reg_fit中的列名与data Hitter中的列名不同 以下是原始数据的输出: names(Hitters) ## [1] "AtBat"
library(leaps)
library(ISLR)
data(Hitters)
reg_fit=regsubsets(Salary~., data = Hitters, nvmax = 10, method = "forward")
问题是reg_fit中的列名与data Hitter中的列名不同
以下是原始数据的输出:
names(Hitters)
## [1] "AtBat" "Hits" "HmRun" "Runs" "RBI"
## [6] "Walks" "Years" "CAtBat" "CHits" "CHmRun"
## [11] "CRuns" "CRBI" "CWalks" "League" "Division"
## [16] "PutOuts" "Assists" "Errors" "Salary" "NewLeague"
以下是从reg_fit中提取的输出:
colnames(summary(reg_fit)$which)
## [1] "(Intercept)" "AtBat" "Hits" "HmRun" "Runs"
## [6] "RBI" "Walks" "Years" "CAtBat" "CHits"
## [11] "CHmRun" "CRuns" "CRBI" "CWalks" "LeagueN"
## [16] "DivisionW" "PutOuts" "Assists" "Errors" "NewLeagueN"
注:Legaue改为LeagueN,Division改为Division W。如果这是一个bug,或者有没有一种简单的方法可以从reg_fit获取列名?这不是bug。它将一个分类变量分解为指标变量,以便在回归中使用,而名称的变化是如何让您知道哪个级别被分配给指标的正级别 如果要避免这种情况,可以通过预处理来实现。下面是变量
League
的示例:
League <- rep(0,322)
League[Hitters$League == "N"] <- 1
Hitters$League <- as.numeric(as.character(League))
reg_fit=regsubsets(Salary~., data = Hitters, nvmax = 10, method = "forward")
colnames(summary(reg_fit)$which)
我会接受这个答案。您的建议是将分类列预处理为数字,以便获得相同的列名。但是,我更喜欢使用model.matrix,因为它没有那么麻烦。谢谢兄弟的解决办法。@MaheshYadav很乐意帮忙。是的model.matrix
非常有效。