Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
问题:将列作为因子追加到数据框中时,会在r中追加的列中创建NA_R_Dataframe_Machine Learning_R Caret - Fatal编程技术网

问题:将列作为因子追加到数据框中时,会在r中追加的列中创建NA

问题:将列作为因子追加到数据框中时,会在r中追加的列中创建NA,r,dataframe,machine-learning,r-caret,R,Dataframe,Machine Learning,R Caret,我是个新手&试图用插入符号学习ml 问题-在创建假人并移除NZV变量后,当我将Y即预测变量添加回df作为因子时,它会在同一列中创建NA(问题的步骤5-6)。那么,我如何保持Y变量作为最终df中的因子呢 1。数据(来自uci/kaggle的银行营销响应数据) 2.保存X&Y变量 Y = subset(data, select = y) X = subset(data, select = -y) dim(X) dim(Y) 3.创建了假人 pp_dummy <- dummyVars(y ~

我是个新手&试图用插入符号学习ml

问题-在创建
假人
并移除
NZV变量
后,当我将
Y
预测变量
添加回df
作为因子
时,它会在同一列中创建
NA
(问题的步骤5-6)
。那么,我如何保持
Y
变量作为最终df中的因子呢

1。数据(来自uci/kaggle的银行营销响应数据)

2.保存X&Y变量

Y = subset(data, select = y)
X = subset(data, select = -y)

dim(X)
dim(Y)
3.创建了假人

pp_dummy <- dummyVars(y ~ ., data = data)

data <- predict(pp_dummy, newdata = data)

data <- data.frame(data)
5。问题:在
上,将y
附加到数据
,作为系数
在列中产生
NA

data$y <- as.factor(Y)

str(data)
6.如果我按原样追加
Y
,它不会立即创建
NA
,但当我将其转换为
因子时,它会给出
NA

data$y <- Y # as.factor(Y)
data <- data %>% mutate(y = as.factor(y))

str(data)

我如何避免使用
pull(data$y)
而只使用
data$y
呢?

它与
pull()无关

即使只有一列,也无法将data.frame转换为向量:

X = subset(iris,select=-Species)
Y = subset(iris,select=Species)

as.factor(Y)
Species 
   <NA> 
Levels: 1:3

.valid.factor(Y)
[1] "factor levels must be \"character\""

levels(Y)
NULL

是的,你是对的@StupidWolf。我不知道为什么我没有在r中区分向量和数据帧的习惯。每次我使用一个列并开始将它作为一个系列/向量处理,其中它是一个数据帧。我想我把一些
python
习惯和
R
混为一谈了。谢谢你纠正和帮助我:)
nzv_list <- nearZeroVar(data) %>% 
            as.vector()

data <- data[, -nzv_list ]

str(data)
'data.frame':   4119 obs. of  44 variables:
 $ age                          : num  30 39 25 38 47 32 32 41 31 35 ...
 $ job.admin.                   : num  0 0 0 0 1 0 1 0 0 0 ...
 $ job.blue.collar              : num  1 0 0 0 0 0 0 0 0 1 ...
 $ job.management               : num  0 0 0 0 0 0 0 0 0 0 ...
 $ job.services                 : num  0 1 1 1 0 1 0 0 1 0 ...
 $ job.technician               : num  0 0 0 0 0 0 0 0 0 0 ...
 $ marital.divorced             : num  0 0 0 0 0 0 0 0 1 0 ...
 $ marital.married              : num  1 0 1 1 1 0 0 1 0 1 ...
 $ marital.single               : num  0 1 0 0 0 1 1 0 0 0 ...
 $ education.basic.4y           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ education.basic.6y           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ education.basic.9y           : num  1 0 0 1 0 0 0 0 0 1 ...
 $ education.high.school        : num  0 1 1 0 0 0 0 0 0 0 ...
 $ education.professional.course: num  0 0 0 0 0 0 0 0 1 0 ...
 $ education.university.degree  : num  0 0 0 0 1 1 1 1 0 0 ...
 $ default.no                   : num  1 1 1 1 1 1 1 0 1 0 ...
 $ default.unknown              : num  0 0 0 0 0 0 0 1 0 1 ...
 $ housing.no                   : num  0 1 0 0 0 1 0 0 1 1 ...
 $ housing.yes                  : num  1 0 1 0 1 0 1 1 0 0 ...
 $ loan.no                      : num  1 1 1 0 1 1 1 1 1 1 ...
 $ loan.yes                     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ contact.cellular             : num  1 0 0 0 1 1 1 1 1 0 ...
 $ contact.telephone            : num  0 1 1 1 0 0 0 0 0 1 ...
 $ month.apr                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ month.aug                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ month.jul                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ month.jun                    : num  0 0 1 1 0 0 0 0 0 0 ...
 $ month.may                    : num  1 1 0 0 0 0 0 0 0 1 ...
 $ month.nov                    : num  0 0 0 0 1 0 0 1 1 0 ...
 $ day_of_week.fri              : num  1 1 0 1 0 0 0 0 0 0 ...
 $ day_of_week.mon              : num  0 0 0 0 1 0 1 1 0 0 ...
 $ day_of_week.thu              : num  0 0 0 0 0 1 0 0 0 1 ...
 $ day_of_week.tue              : num  0 0 0 0 0 0 0 0 1 0 ...
 $ day_of_week.wed              : num  0 0 1 0 0 0 0 0 0 0 ...
 $ duration                     : num  487 346 227 17 58 128 290 44 68 170 ...
 $ campaign                     : num  2 4 1 3 1 3 4 2 1 1 ...
 $ previous                     : num  0 0 0 0 0 2 0 0 1 0 ...
 $ poutcome.failure             : num  0 0 0 0 0 1 0 0 1 0 ...
 $ poutcome.nonexistent         : num  1 1 1 1 1 0 1 1 0 1 ...
 $ emp.var.rate                 : num  -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
 $ cons.price.idx               : num  92.9 94 94.5 94.5 93.2 ...
 $ cons.conf.idx                : num  -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
 $ euribor3m                    : num  1.31 4.86 4.96 4.96 4.19 ...
 $ nr.employed                  : num  5099 5191 5228 5228 5196 ...
data$y <- as.factor(Y)

str(data)
'data.frame':   4119 obs. of  45 variables:
 $ age                          : num  30 39 25 38 47 32 32 41 31 35 ...
 $ job.admin.                   : num  0 0 0 0 1 0 1 0 0 0 ...
 $ job.blue.collar              : num  1 0 0 0 0 0 0 0 0 1 ...
 $ job.management               : num  0 0 0 0 0 0 0 0 0 0 ...
 $ job.services                 : num  0 1 1 1 0 1 0 0 1 0 ...
 $ job.technician               : num  0 0 0 0 0 0 0 0 0 0 ...
 $ marital.divorced             : num  0 0 0 0 0 0 0 0 1 0 ...
 $ marital.married              : num  1 0 1 1 1 0 0 1 0 1 ...
 $ marital.single               : num  0 1 0 0 0 1 1 0 0 0 ...
 $ education.basic.4y           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ education.basic.6y           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ education.basic.9y           : num  1 0 0 1 0 0 0 0 0 1 ...
 $ education.high.school        : num  0 1 1 0 0 0 0 0 0 0 ...
 $ education.professional.course: num  0 0 0 0 0 0 0 0 1 0 ...
 $ education.university.degree  : num  0 0 0 0 1 1 1 1 0 0 ...
 $ default.no                   : num  1 1 1 1 1 1 1 0 1 0 ...
 $ default.unknown              : num  0 0 0 0 0 0 0 1 0 1 ...
 $ housing.no                   : num  0 1 0 0 0 1 0 0 1 1 ...
 $ housing.yes                  : num  1 0 1 0 1 0 1 1 0 0 ...
 $ loan.no                      : num  1 1 1 0 1 1 1 1 1 1 ...
 $ loan.yes                     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ contact.cellular             : num  1 0 0 0 1 1 1 1 1 0 ...
 $ contact.telephone            : num  0 1 1 1 0 0 0 0 0 1 ...
 $ month.apr                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ month.aug                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ month.jul                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ month.jun                    : num  0 0 1 1 0 0 0 0 0 0 ...
 $ month.may                    : num  1 1 0 0 0 0 0 0 0 1 ...
 $ month.nov                    : num  0 0 0 0 1 0 0 1 1 0 ...
 $ day_of_week.fri              : num  1 1 0 1 0 0 0 0 0 0 ...
 $ day_of_week.mon              : num  0 0 0 0 1 0 1 1 0 0 ...
 $ day_of_week.thu              : num  0 0 0 0 0 1 0 0 0 1 ...
 $ day_of_week.tue              : num  0 0 0 0 0 0 0 0 1 0 ...
 $ day_of_week.wed              : num  0 0 1 0 0 0 0 0 0 0 ...
 $ duration                     : num  487 346 227 17 58 128 290 44 68 170 ...
 $ campaign                     : num  2 4 1 3 1 3 4 2 1 1 ...
 $ previous                     : num  0 0 0 0 0 2 0 0 1 0 ...
 $ poutcome.failure             : num  0 0 0 0 0 1 0 0 1 0 ...
 $ poutcome.nonexistent         : num  1 1 1 1 1 0 1 1 0 1 ...
 $ emp.var.rate                 : num  -1.8 1.1 1.4 1.4 -0.1 -1.1 -1.1 -0.1 -0.1 1.1 ...
 $ cons.price.idx               : num  92.9 94 94.5 94.5 93.2 ...
 $ cons.conf.idx                : num  -46.2 -36.4 -41.8 -41.8 -42 -37.5 -37.5 -42 -42 -36.4 ...
 $ euribor3m                    : num  1.31 4.86 4.96 4.96 4.19 ...
 $ nr.employed                  : num  5099 5191 5228 5228 5196 ...
 $ y                            : Factor w/ 1 level "1:2": NA NA NA NA NA NA NA NA NA NA ...
data$y <- Y # as.factor(Y)
data <- data %>% mutate(y = as.factor(y))

str(data)

subsets <- c(7, 10, 12, 15, 20)

control <- rfeControl(functions = rfFuncs, method = "cv", verbose = FALSE)


system.time(
  RFE_res <- rfe(x = data[, 1:44],    # subset(train, select = -y) 
                        y = pull(data$y), 
                        sizes = subsets,
                        rfeControl = control
                 )
) 
X = subset(iris,select=-Species)
Y = subset(iris,select=Species)

as.factor(Y)
Species 
   <NA> 
Levels: 1:3

.valid.factor(Y)
[1] "factor levels must be \"character\""

levels(Y)
NULL
X$y = as.factor(Y$Species)
# or X %>% mutate(y = as.factor(Y$Species))

> str(X)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ y           : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...