Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Can';t使用R对数据进行分类_R_Model - Fatal编程技术网

Can';t使用R对数据进行分类

Can';t使用R对数据进行分类,r,model,R,Model,我试图对我的数据进行分类,以便建立逻辑回归模型。我对R很陌生,学习它是为了学习。我已经使用了我在多个示例中看到的代码,但是没有任何东西能够通过并保持不变。也没有给出错误 ds <- read.csv("adult.csv") colnames(ds)<- c("age","workclass","responsenum","education","education_year

我试图对我的数据进行分类,以便建立逻辑回归模型。我对R很陌生,学习它是为了学习。我已经使用了我在多个示例中看到的代码,但是没有任何东西能够通过并保持不变。也没有给出错误

ds <- read.csv("adult.csv")
colnames(ds)<- c("age","workclass","responsenum","education","education_years","marital_status","occupation","familyrole", "race","sex", "capital_gain", "capital_loss", "hours_per_week","country", "income") 

ds$workclass <- as.character(ds$workclass)
ds$workclass[ds$workclass == "Without-pay" | ds$workclass == "Never-worked"] <- "Jobless"
ds$workclass[ds$workclass == "State-gov" | ds$workclass == "Local-gov"]  <- "govt" 
ds$workclass[ds$workclass == "Self-emp-inc" | ds$workclass == "Self-emp-not-inc"]  <- "Self-employed" 

ds您的数据有前导空格,因此
的“Self imp not inc”
永远不会与
的“Self emp not inc”
匹配

想法:

  • 可以从所有类似字符串的列中修剪前导/尾随空格

    str(ds,list.len=4)
    #“data.frame”:6个obs。在15个变量中:
    #$X39:int 50 38 53 28 37 49
    #$State.gov:chr“Self emp not inc”“Private”“Private”“Private”。。。
    #$X77516:int 83311 215646 234721 338409 284582 160187
    #$Bachelors:chr“Bachelors”“HS grad”“第11届”“Bachelors”。。。
    #[列表输出被截断]
    
    ischr能否提供来自
    dput(头部(ds))的输出?如果帧的前几行不是很有用,那么可能
    dput(ds[c(1,3,5,7),1:3])
    (其中第一个vec是有趣的行,
    1:3
    是我们只需要几列)。我同意@r2evans。我在模拟的数据帧上测试了你的代码,效果很好。我们需要了解一下您的数据,以了解发生了什么,并在我们的系统上进行复制。请将其添加到您的问题中。结构(list(X39=c(50L,53L,37L,52L),State.gov=c(“Self emp not inc”,“Private”,“Private”,“Self emp not inc”),X77516=c(83311L,234721L,284582L,209642L)),row.names=c(1L,3L,5L,7L),class=“data.frame”)是我使用dput(ds[c(1,3,5,7),1:3]时的输出我没有在数据中看到工人阶级,但是。。。那里的所有字符串都有前导空格。您可以使用
    ischr之类的工具全局修复它。我已经使用colnames()创建了workclass作为自己的头文件
    
    structure(list(age = c(50L, 38L, 53L, 28L, 37L, 49L), workclass = c(" Self-emp-not-inc", 
    " Private", " Private", " Private", " Private", " Private"), 
        responsenum = c(83311L, 215646L, 234721L, 338409L, 284582L, 
        160187L), education = c(" Bachelors", " HS-grad", " 11th", 
        " Bachelors", " Masters", " 9th"), education_years = c(13L, 
        9L, 7L, 13L, 14L, 5L), marital_status = c(" Married-civ-spouse", 
        " Divorced", " Married-civ-spouse", " Married-civ-spouse", 
        " Married-civ-spouse", " Married-spouse-absent"), occupation = c(" Exec-managerial", 
        " Handlers-cleaners", " Handlers-cleaners", " Prof-specialty", 
        " Exec-managerial", " Other-service"), familyrole = c(" Husband", 
        " Not-in-family", " Husband", " Wife", " Wife", " Not-in-family"
        ), race = c(" White", " White", " Black", " Black", " White", 
        " Black"), sex = c(" Male", " Male", " Male", " Female", 
        " Female", " Female"), capital_gain = c(0L, 0L, 0L, 0L, 0L, 
        0L), capital_loss = c(0L, 0L, 0L, 0L, 0L, 0L), hours_per_week = c(13L, 
        40L, 40L, 40L, 40L, 16L), country = c(" United-States", " United-States", 
        " United-States", " Cuba", " United-States", " Jamaica"), 
        income = c(" <=50K", " <=50K", " <=50K", " <=50K", " <=50K", 
        " <=50K")), row.names = c(NA, 6L), class = "data.frame")