Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/linq/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 插入符号虚拟变量排除目标_R_R Caret_Lda - Fatal编程技术网

R 插入符号虚拟变量排除目标

R 插入符号虚拟变量排除目标,r,r-caret,lda,R,R Caret,Lda,如何在插入符号中使用虚拟变量而不破坏目标变量 set.seed(5) data <- ISLR::OJ data<-na.omit(data) dummies <- dummyVars( Purchase ~ ., data = data) data2 <- predict(dummies, newdata = data) split_factor = 0.5 n_samples = nrow(data2) train_idx <- sample(seq_len(

如何在插入符号中使用虚拟变量而不破坏目标变量

set.seed(5)
data <- ISLR::OJ
data<-na.omit(data)

dummies <- dummyVars( Purchase ~ ., data = data)
data2 <- predict(dummies, newdata = data)
split_factor = 0.5
n_samples = nrow(data2)
train_idx <- sample(seq_len(n_samples), size = floor(split_factor * n_samples))
train <- data2[train_idx, ]
test <- data2[-train_idx, ]
modelFit<- train(Purchase~ ., method='lda',preProcess=c('scale', 'center'), data=train)
set.seed(5)

数据至少示例代码在下面的注释中似乎有一些问题。回答您的问题:

  • ifelse的结果是一个整数向量,而不是一个因子,因此训练函数默认为回归
  • 将dummyVars直接传递给函数是通过使用序列(x=,y=,…)而不是公式来完成的
要避免这些问题,请仔细检查对象的

请注意,
train()
中的选项
preProcess
将对所有数值变量(包括假人)应用预处理。下面的选项2避免这种情况,请在调用
train()
之前对数据进行标准化

set.seed(5)

数据至少示例代码在下面的注释中似乎有一些问题。回答您的问题:

  • ifelse的结果是一个整数向量,而不是一个因子,因此训练函数默认为回归
  • 将dummyVars直接传递给函数是通过使用序列(x=,y=,…)而不是公式来完成的
要避免这些问题,请仔细检查对象的

请注意,
train()
中的选项
preProcess
将对所有数值变量(包括假人)应用预处理。下面的选项2避免这种情况,请在调用
train()
之前对数据进行标准化

set.seed(5)

数据你可以做
data2$购买我试过了,但这似乎扭曲了矩阵的结果。有没有可能把dummyVars从caret直接送到火车上?作为一个管道?你可以做
data2$Purchase我试过了-但这似乎扭曲了矩阵的结果。有没有可能把dummyVars从caret直接送到火车上?作为管道?您确定预处理不会也应用于分类变量(现在是伪变量1/0)?@PepitoDeMallorca这是一个合理的问题,尽管不是OP问题的一部分。我已经更新了选项2,以提供一种解决方案来避免这种情况。您确定预处理不会同时应用于分类变量(现在是伪变量1/0)?@PepitoDeMallorca这是一个值得关注的问题,尽管不是OP问题的一部分。我已经更新了选项2,以提供一个解决方案来避免这种情况
set.seed(5)
data <- ISLR::OJ
data<-na.omit(data)

# Make sure that all variables that should be a factor are defined as such
newFactorIndex <- c("StoreID","SpecialCH","SpecialMM","STORE")
data[, newFactorIndex] <- lapply(data[,newFactorIndex], factor)

library(caret)
# See help for dummyVars. The function does not take a dependent variable and predict will give an error
# I don't include the target variable here, so predicting dummies on new data will drop unknown columns
# including the target variable
dummies <- dummyVars(~., data = data[,-1])
# I don't change the data yet to apply standardization to the numeric variables, 
# before turning the categorical variables into dummies

split_factor = 0.5
n_samples = nrow(data)
train_idx <- sample(seq_len(n_samples), size = floor(split_factor * n_samples))

# Option 1 (as asked): Specify independent and dependent variables separately
# Note that dummy variables will be standardized by preProcess as per the original code

# Turn the categorical variabels to (unstandardized) dummies
# The output of predict is a matrix, change it to data frame
data2 <- data.frame(predict(dummies, newdata = data))

modelFit<- train(y = data[train_idx, "Purchase"], x = data2[train_idx,], method='lda',preProcess=c('scale', 'center'))

# Option 2: Append dependent variable to the independent variables (needs to be a data frame to allow factor and numeric)
# Note that I also shift the proprocessing away from train() to
# avoid standardizing the dummy variables 

train <- data[train_idx, ]
test <- data[-train_idx, ]

preprocessor <- preProcess(train[!sapply(train, is.factor)], method = c('center',"scale"))
train <- predict(preprocessor, train)
test <- predict(preprocessor, test)

# Turn the categorical variabels to (unstandardized) dummies
# The output of predict is a matrix, change it to data frame
train <- data.frame(predict(dummies, newdata = train))
test <- data.frame(predict(dummies, newdata = test))

# Reattach the target variable to the training data that has been 
# dropped by predict(dummies,...)
train$Purchase <- data$Purchase[train_idx]
modelFit<- train(Purchase ~., data = train, method='lda')