问题：将列作为因子追加到数据框中时，会在r中追加的列中创建NA_R_Dataframe_Machine Learning_R Caret

问题：将列作为因子追加到数据框中时，会在r中追加的列中创建NA

r dataframe machine-learning

问题：将列作为因子追加到数据框中时，会在r中追加的列中创建NA,r,dataframe,machine-learning,r-caret,R,Dataframe,Machine Learning,R Caret,我是个新手&试图用插入符号学习ml 问题-在创建假人并移除NZV变量后，当我将Y即预测变量添加回df作为因子时，它会在同一列中创建NA（问题的步骤5-6）。那么，我如何保持Y变量作为最终df中的因子呢 1。数据（来自uci/kaggle的银行营销响应数据） 2.保存X&Y变量 Y = subset(data, select = y) X = subset(data, select = -y) dim(X) dim(Y) 3.创建了假人 pp_dummy <- dummyVars(y ~

我是个新手&试图用插入符号学习ml

问题-在创建

假人

并移除

NZV变量

后，当我将

即

预测变量

添加回df

作为因子

时，它会在同一列中创建
NA
（问题的步骤5-6）。那么，我如何保持

变量作为最终df中的因子呢

1。数据（来自uci/kaggle的银行营销响应数据）

2.保存X&Y变量

Y = subset(data, select = y)
X = subset(data, select = -y)

dim(X)
dim(Y)

3.创建了假人

pp_dummy <- dummyVars(y ~ ., data = data) data <- predict(pp_dummy, newdata = data) data <- data.frame(data)
5。问题：在
上，将y
附加到数据
，作为系数
在列中产生
NA

data$y <- as.factor(Y) str(data)
6.如果我按原样追加
Y
，它不会立即创建
NA
，但当我将其转换为
因子时，它会给出NA data$y <- Y # as.factor(Y) data <- data %>% mutate(y = as.factor(y)) str(data) 我如何避免使用pull（data$y）而只使用data$y 呢？它与pull（）无关即使只有一列，也无法将data.frame转换为向量： X = subset(iris,select=-Species) Y = subset(iris,select=Species) as.factor(Y) Species <NA> Levels: 1:3 .valid.factor(Y) [1] "factor levels must be \"character\"" levels(Y) NULL 是的，你是对的@StupidWolf。我不知道为什么我没有在r中区分向量和数据帧的习惯。每次我使用一个列并开始将它作为一个系列/向量处理，其中它是一个数据帧。我想我把一些python 习惯和R混为一谈了。谢谢你纠正和帮助我：） nzv_list <- nearZeroVar(data) %>% as.vector() data <- data[, -nzv_list ] str(data) 'data.frame': 4119 obs. of 44 variables: $ age : num 30 39 25 38 47 32 32 41 31 35 ... $ job.admin. : num 0 0 0 0 1 0 1 0 0 0 ... $ job.blue.collar : num 1 0 0 0 0 0 0 0 0 1 ... $ job.management : num 0 0 0 0 0 0 0 0 0 0 ... $ job.services : num 0 1 1 1 0 1 0 0 1 0 ... $ job.technician : num 0 0 0 0 0 0 0 0 0 0 ... $ marital.divorced : num 0 0 0 0 0 0 0 0 1 0 ... $ marital.married : num 1 0 1 1 1 0 0 1 0 1 ... $ marital.single : num 0 1 0 0 0 1 1 0 0 0 ... $ education.basic.4y : num 0 0 0 0 0 0 0 0 0 0 ... $ education.basic.6y : num 0 0 0 0 0 0 0 0 0 0 ... $ education.basic.9y : num 1 0 0 1 0 0 0 0 0 1 ... $ education.high.school : num 0 1 1 0 0 0 0 0 0 0 ... $ education.professional.course: num 0 0 0 0 0 0 0 0 1 0 ... $ education.university.degree : num 0 0 0 0 1 1 1 1 0 0 ... $ default.no : num 1 1 1 1 1 1 1 0 1 0 ... $ default.unknown : num 0 0 0 0 0 0 0 1 0 1 ... $ housing.no : num 0 1 0 0 0 1 0 0 1 1 ... $ housing.yes : num 1 0 1 0 1 0 1 1 0 0 ... $ loan.no : num 1 1 1 0 1 1 1 1 1 1 ... $ loan.yes : num 0 0 0 0 0 0 0 0 0 0 ... $ contact.cellular : num 1 0 0 0 1 1 1 1 1 0 ... $ contact.telephone : num 0 1 1 1 0 0 0 0 0 1 ... $ month.apr : num 0 0 0 0 0 0 0 0 0 0 ... $ month.aug : num 0 0 0 0 0 0 0 0 0 0 ... $ month.jul : num 0 0 0 0 0 0 0 0 0 0 ... $ month.jun : num 0 0 1 1 0 0 0 0 0 0 ... $ month.may : num 1 1 0 0 0 0 0 0 0 1 ... $ month.nov : num 0 0 0 0 1 0 0 1 1 0 ... $ day_of_week.fri : num 1 1 0 1 0 0 0 0 0 0 ... $ day_of_week.mon : num 0 0 0 0 1 0 1 1 0 0 ... $ day_of_week.thu : num 0 0 0 0 0 1 0 0 0 1 ... $ day_of_week.tue : num 0 0 0 0 0 0 0 0 1 0 ... $ day_of_week.wed : num 0 0 1 0 0 0 0 0 0 0 ... $ duration : num 487 346 227 17 58 128 290 44 68 170 ... $ campaign : num 2 4 1 3 1 3 4 2 1 1 ... $ previous : num 0 0 0 0 0 2 0 0 1 0 ... $ poutcome.failure : num 0 0 0 0 0 1 0 0 1 0 ... $ poutcome.nonexistent : num 1 1 1 1 1 0 1 1 0 1 ... $ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ... $ cons.price.idx : num 92.9 94 94.5 94.5 93.2 ... $ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ... $ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ... $ nr.employed : num 5099 5191 5228 5228 5196 ... data$y <- as.factor(Y) str(data) 'data.frame': 4119 obs. of 45 variables: $ age : num 30 39 25 38 47 32 32 41 31 35 ... $ job.admin. : num 0 0 0 0 1 0 1 0 0 0 ... $ job.blue.collar : num 1 0 0 0 0 0 0 0 0 1 ... $ job.management : num 0 0 0 0 0 0 0 0 0 0 ... $ job.services : num 0 1 1 1 0 1 0 0 1 0 ... $ job.technician : num 0 0 0 0 0 0 0 0 0 0 ... $ marital.divorced : num 0 0 0 0 0 0 0 0 1 0 ... $ marital.married : num 1 0 1 1 1 0 0 1 0 1 ... $ marital.single : num 0 1 0 0 0 1 1 0 0 0 ... $ education.basic.4y : num 0 0 0 0 0 0 0 0 0 0 ... $ education.basic.6y : num 0 0 0 0 0 0 0 0 0 0 ... $ education.basic.9y : num 1 0 0 1 0 0 0 0 0 1 ... $ education.high.school : num 0 1 1 0 0 0 0 0 0 0 ... $ education.professional.course: num 0 0 0 0 0 0 0 0 1 0 ... $ education.university.degree : num 0 0 0 0 1 1 1 1 0 0 ... $ default.no : num 1 1 1 1 1 1 1 0 1 0 ... $ default.unknown : num 0 0 0 0 0 0 0 1 0 1 ... $ housing.no : num 0 1 0 0 0 1 0 0 1 1 ... $ housing.yes : num 1 0 1 0 1 0 1 1 0 0 ... $ loan.no : num 1 1 1 0 1 1 1 1 1 1 ... $ loan.yes : num 0 0 0 0 0 0 0 0 0 0 ... $ contact.cellular : num 1 0 0 0 1 1 1 1 1 0 ... $ contact.telephone : num 0 1 1 1 0 0 0 0 0 1 ... $ month.apr : num 0 0 0 0 0 0 0 0 0 0 ... $ month.aug : num 0 0 0 0 0 0 0 0 0 0 ... $ month.jul : num 0 0 0 0 0 0 0 0 0 0 ... $ month.jun : num 0 0 1 1 0 0 0 0 0 0 ... $ month.may : num 1 1 0 0 0 0 0 0 0 1 ... $ month.nov : num 0 0 0 0 1 0 0 1 1 0 ... $ day_of_week.fri : num 1 1 0 1 0 0 0 0 0 0 ... $ day_of_week.mon : num 0 0 0 0 1 0 1 1 0 0 ... $ day_of_week.thu : num 0 0 0 0 0 1 0 0 0 1 ... $ day_of_week.tue : num 0 0 0 0 0 0 0 0 1 0 ... $ day_of_week.wed : num 0 0 1 0 0 0 0 0 0 0 ... $ duration : num 487 346 227 17 58 128 290 44 68 170 ... $ campaign : num 2 4 1 3 1 3 4 2 1 1 ... $ previous : num 0 0 0 0 0 2 0 0 1 0 ... $ poutcome.failure : num 0 0 0 0 0 1 0 0 1 0 ... $ poutcome.nonexistent : num 1 1 1 1 1 0 1 1 0 1 ... $ emp.var.rate : num -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ... $ cons.price.idx : num 92.9 94 94.5 94.5 93.2 ... $ cons.conf.idx : num -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ... $ euribor3m : num 1.31 4.86 4.96 4.96 4.19 ... $ nr.employed : num 5099 5191 5228 5228 5196 ... $ y : Factor w/ 1 level "1:2": NA NA NA NA NA NA NA NA NA NA ... data$y <- Y # as.factor(Y) data <- data %>% mutate(y = as.factor(y)) str(data) subsets <- c(7, 10, 12, 15, 20) control <- rfeControl(functions = rfFuncs, method = "cv", verbose = FALSE) system.time( RFE_res <- rfe(x = data[, 1:44], # subset(train, select = -y) y = pull(data$y), sizes = subsets, rfeControl = control ) ) X = subset(iris,select=-Species) Y = subset(iris,select=Species) as.factor(Y) Species <NA> Levels: 1:3 .valid.factor(Y) [1] "factor levels must be \"character\"" levels(Y) NULL X$y = as.factor(Y$Species) # or X %>% mutate(y = as.factor(Y$Species)) > str(X) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ y : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...