randomForest中出错,NA,对象中缺少值

randomForest中出错,NA,对象中缺少值,r,random-forest,R,Random Forest,当我试着 marketing.rf <- randomForest(formula = as.numeric(y) ~., data = marketing.train, importance = TRUE) 当我尝试时: y.val <- ifelse(marketing.train$y=="yes", 1,0) marketing.rf <- randomForest(formula = as.numeric(y.val) ~., data = marketing.t

当我试着

marketing.rf <- randomForest(formula = as.numeric(y) ~., data = marketing.train, importance = TRUE) 
当我尝试时:

y.val <- ifelse(marketing.train$y=="yes", 1,0)
marketing.rf <-  randomForest(formula = as.numeric(y.val) ~., data = marketing.train, importance = TRUE) 
我试图将
用作.factor(y)
,但它显示了类似的错误。 我使用了
dput(marketing.test$y)
查看这些值,但在其中找不到任何NA或无效值

我对R很陌生,有人能帮我修一下吗?谢谢

以下是样本列车数据:

age job             marital     edu         default   balance  housing   loan   y
58  management      married     tertiary    no        2143     yes       no     no
33  entrepreneur    married     secondary   no        2        yes       yes    no
33  unknown         single      unknown     no        1        no        no     no
42  entrepreneur    divorced    tertiary    yes       2        yes       no     no

下面是一个包含reprex数据的完整示例。没有你的数据,我无法做出完美的答案,但如果你遵循这个逻辑,你应该会没事的

library(randomForest)

# Generate Some Fake Data
fake_data <- data.frame(
  age = runif(500, 30, 65),
  martial = sample(c("single", "married", "divorced"), 500, T),
  default = sample(c("yes", "no"), 500, T),
  balance = runif(500,0,2100),
  housing = sample(c("yes", "no"), 500, T),
  loan = sample(c("yes", "no"), 500, T),
  stringsAsFactors = FALSE
)

# Add some missing data for example

fake_data[sample(x = 1:500, size = 5), "loan"] <- NA

# Check for NAs

fake_data_2 <- fake_data[!is.na(fake_data$loan),]

cat("You have removed ", nrow(fake_data)-nrow(fake_data_2), " records")

# Add target and make sure it is a factor

fake_data_2$y <- as.factor(fake_data_2$loan)

# Make characters into factors
library(dplyr)

fake_data_2 <- fake_data_2 %>% 
  mutate_if(is.character, as.factor)

fit <- randomForest(y ~ ., data = fake_data_2)
库(随机林)
#生成一些虚假数据

伪_数据这里是一个完整的reprex数据示例。没有你的数据,我无法做出完美的答案,但如果你遵循这个逻辑,你应该会没事的

library(randomForest)

# Generate Some Fake Data
fake_data <- data.frame(
  age = runif(500, 30, 65),
  martial = sample(c("single", "married", "divorced"), 500, T),
  default = sample(c("yes", "no"), 500, T),
  balance = runif(500,0,2100),
  housing = sample(c("yes", "no"), 500, T),
  loan = sample(c("yes", "no"), 500, T),
  stringsAsFactors = FALSE
)

# Add some missing data for example

fake_data[sample(x = 1:500, size = 5), "loan"] <- NA

# Check for NAs

fake_data_2 <- fake_data[!is.na(fake_data$loan),]

cat("You have removed ", nrow(fake_data)-nrow(fake_data_2), " records")

# Add target and make sure it is a factor

fake_data_2$y <- as.factor(fake_data_2$loan)

# Make characters into factors
library(dplyr)

fake_data_2 <- fake_data_2 %>% 
  mutate_if(is.character, as.factor)

fit <- randomForest(y ~ ., data = fake_data_2)
库(随机林)
#生成一些虚假数据

伪_数据“y”列中缺少值,因此它不知道如何对这些数据行进行训练。使用
train\u dat谢谢@MDEWITT!但是,现在它显示了这个错误:randomForest中的错误。默认值(m,y,…):响应长度必须与预测器相同“y”列中缺少值,因此它不知道如何对这些数据行进行训练。使用
train\u dat谢谢@MDEWITT!但是,现在它显示了这个错误:randomForest中的错误。默认值(m,y,…):响应的长度必须与predictors@Andrea很高兴听到!如果它有效,请你接受答案,让其他人知道它解决了你的问题(这篇文章上下箭头下方的绿色小复选标记),@Andrea很高兴听到!如果它有效,请你接受答案,让其他人知道它解决了你的问题(这篇文章上下箭头下方的绿色小复选标记),
library(randomForest)

# Generate Some Fake Data
fake_data <- data.frame(
  age = runif(500, 30, 65),
  martial = sample(c("single", "married", "divorced"), 500, T),
  default = sample(c("yes", "no"), 500, T),
  balance = runif(500,0,2100),
  housing = sample(c("yes", "no"), 500, T),
  loan = sample(c("yes", "no"), 500, T),
  stringsAsFactors = FALSE
)

# Add some missing data for example

fake_data[sample(x = 1:500, size = 5), "loan"] <- NA

# Check for NAs

fake_data_2 <- fake_data[!is.na(fake_data$loan),]

cat("You have removed ", nrow(fake_data)-nrow(fake_data_2), " records")

# Add target and make sure it is a factor

fake_data_2$y <- as.factor(fake_data_2$loan)

# Make characters into factors
library(dplyr)

fake_data_2 <- fake_data_2 %>% 
  mutate_if(is.character, as.factor)

fit <- randomForest(y ~ ., data = fake_data_2)