R 为什么当我试图对数据采样不足时,总是出现错误?

R 为什么当我试图对数据采样不足时,总是出现错误?,r,oversampling,R,Oversampling,我试图对我的数据进行低采样,但不断出现以下错误: Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' 从本质上说,我有一个不平衡的数据集,其中的部分要么通过了测试,要么没有通过测试。只有大约1%的零件出现故障,我想运行一些模型来预测这些故障。我想先平衡数据,这就是我所尝试的 # Split t

我试图对我的数据进行低采样,但不断出现以下错误:

Error in sample.int(length(x), size, replace, prob) : 
  cannot take a sample larger than the population when 'replace = FALSE'
从本质上说,我有一个不平衡的数据集,其中的部分要么通过了测试,要么没有通过测试。只有大约1%的零件出现故障,我想运行一些模型来预测这些故障。我想先平衡数据,这就是我所尝试的

# Split the data into training and test
library(caTools)
passes<-nn[grepl(0, nn$nfail),] # get all passes
fails<-nn[!grepl(0, nn$nfail),] # get all fails
splitp <- sample.split(passes$nfail, SplitRatio = 0.75)
splitf<- sample.split(fails$nfail, SplitRatio = 0.75)
trainp <- subset(passes, splitp == TRUE) # get 75% of passes for training
testp <- subset(passes, splitp == FALSE) # get 25% of passes for testing
trainf <- subset(fails, splitf == TRUE) # get 75% of fails for training
testf <- subset(fails, splitf == FALSE) # get 25% of fails for testing
train <- rbind(trainp,trainf) # combine training passes and fails
test<- rbind(testp,testf)

library(ROSE)
# Undersampling
data_balanced_under <- ovun.sample(nfail ~ ., data = train, method = "under", N = 2*nrow(trainf), seed = 1)$data
table(data_balanced_under$nfail)

# Oversampling
data_balanced_over <- ovun.sample(nfail ~ ., data = train, method = "over",N = 2*nrow(trainp),seed=1)$data
table(data_balanced_over$nfail)
#将数据分为训练和测试
图书馆(caTools)
通行证