Can'；t使用R对数据进行分类_R_Model

Can'；t使用R对数据进行分类

r model

Can'；t使用R对数据进行分类,r,model,R,Model,我试图对我的数据进行分类，以便建立逻辑回归模型。我对R很陌生，学习它是为了学习。我已经使用了我在多个示例中看到的代码，但是没有任何东西能够通过并保持不变。也没有给出错误 ds <- read.csv("adult.csv") colnames(ds)<- c("age","workclass","responsenum","education","education_year

我试图对我的数据进行分类，以便建立逻辑回归模型。我对R很陌生，学习它是为了学习。我已经使用了我在多个示例中看到的代码，但是没有任何东西能够通过并保持不变。也没有给出错误

ds <- read.csv("adult.csv")
colnames(ds)<- c("age","workclass","responsenum","education","education_years","marital_status","occupation","familyrole", "race","sex", "capital_gain", "capital_loss", "hours_per_week","country", "income") 

ds$workclass <- as.character(ds$workclass)
ds$workclass[ds$workclass == "Without-pay" | ds$workclass == "Never-worked"] <- "Jobless"
ds$workclass[ds$workclass == "State-gov" | ds$workclass == "Local-gov"]  <- "govt" 
ds$workclass[ds$workclass == "Self-emp-inc" | ds$workclass == "Self-emp-not-inc"]  <- "Self-employed"

ds您的数据有前导空格，因此的“Self imp not inc”
永远不会与的“Self emp not inc”
匹配
想法：

可以从所有类似字符串的列中修剪前导/尾随空格
str（ds，list.len=4）
#“data.frame”：6个obs。在15个变量中：
#$X39:int 50 38 53 28 37 49
#$State.gov:chr“Self emp not inc”“Private”“Private”“Private”。。。
#$X77516:int 83311 215646 234721 338409 284582 160187
#$Bachelors:chr“Bachelors”“HS grad”“第11届”“Bachelors”。。。
#[列表输出被截断]
ischr能否提供来自dput（头部（ds））的输出？如果帧的前几行不是很有用，那么可能dput（ds[c（1,3,5,7），1:3]）
（其中第一个vec是有趣的行，1:3
是我们只需要几列）。我同意@r2evans。我在模拟的数据帧上测试了你的代码，效果很好。我们需要了解一下您的数据，以了解发生了什么，并在我们的系统上进行复制。请将其添加到您的问题中。结构（list（X39=c（50L，53L，37L，52L），State.gov=c（“Self emp not inc”，“Private”，“Private”，“Self emp not inc”），X77516=c（83311L，234721L，284582L，209642L）），row.names=c（1L，3L，5L，7L），class=“data.frame”）是我使用dput（ds[c（1,3,5,7），1:3]时的输出我没有在数据中看到工人阶级，但是。。。那里的所有字符串都有前导空格。您可以使用ischr之类的工具全局修复它。我已经使用colnames（）创建了workclass作为自己的头文件
structure(list(age = c(50L, 38L, 53L, 28L, 37L, 49L), workclass = c(" Self-emp-not-inc", 
" Private", " Private", " Private", " Private", " Private"), 
    responsenum = c(83311L, 215646L, 234721L, 338409L, 284582L, 
    160187L), education = c(" Bachelors", " HS-grad", " 11th", 
    " Bachelors", " Masters", " 9th"), education_years = c(13L, 
    9L, 7L, 13L, 14L, 5L), marital_status = c(" Married-civ-spouse", 
    " Divorced", " Married-civ-spouse", " Married-civ-spouse", 
    " Married-civ-spouse", " Married-spouse-absent"), occupation = c(" Exec-managerial", 
    " Handlers-cleaners", " Handlers-cleaners", " Prof-specialty", 
    " Exec-managerial", " Other-service"), familyrole = c(" Husband", 
    " Not-in-family", " Husband", " Wife", " Wife", " Not-in-family"
    ), race = c(" White", " White", " Black", " Black", " White", 
    " Black"), sex = c(" Male", " Male", " Male", " Female", 
    " Female", " Female"), capital_gain = c(0L, 0L, 0L, 0L, 0L, 
    0L), capital_loss = c(0L, 0L, 0L, 0L, 0L, 0L), hours_per_week = c(13L, 
    40L, 40L, 40L, 40L, 16L), country = c(" United-States", " United-States", 
    " United-States", " Cuba", " United-States", " Jamaica"), 
    income = c(" <=50K", " <=50K", " <=50K", " <=50K", " <=50K", 
    " <=50K")), row.names = c(NA, 6L), class = "data.frame")