Can';t使用R对数据进行分类
我试图对我的数据进行分类,以便建立逻辑回归模型。我对R很陌生,学习它是为了学习。我已经使用了我在多个示例中看到的代码,但是没有任何东西能够通过并保持不变。也没有给出错误Can';t使用R对数据进行分类,r,model,R,Model,我试图对我的数据进行分类,以便建立逻辑回归模型。我对R很陌生,学习它是为了学习。我已经使用了我在多个示例中看到的代码,但是没有任何东西能够通过并保持不变。也没有给出错误 ds <- read.csv("adult.csv") colnames(ds)<- c("age","workclass","responsenum","education","education_year
ds <- read.csv("adult.csv")
colnames(ds)<- c("age","workclass","responsenum","education","education_years","marital_status","occupation","familyrole", "race","sex", "capital_gain", "capital_loss", "hours_per_week","country", "income")
ds$workclass <- as.character(ds$workclass)
ds$workclass[ds$workclass == "Without-pay" | ds$workclass == "Never-worked"] <- "Jobless"
ds$workclass[ds$workclass == "State-gov" | ds$workclass == "Local-gov"] <- "govt"
ds$workclass[ds$workclass == "Self-emp-inc" | ds$workclass == "Self-emp-not-inc"] <- "Self-employed"
ds您的数据有前导空格,因此的“Self imp not inc”
永远不会与的“Self emp not inc”
匹配
想法:
- 可以从所有类似字符串的列中修剪前导/尾随空格
str(ds,list.len=4)
#“data.frame”:6个obs。在15个变量中:
#$X39:int 50 38 53 28 37 49
#$State.gov:chr“Self emp not inc”“Private”“Private”“Private”。。。
#$X77516:int 83311 215646 234721 338409 284582 160187
#$Bachelors:chr“Bachelors”“HS grad”“第11届”“Bachelors”。。。
#[列表输出被截断]
ischr能否提供来自dput(头部(ds))的输出?如果帧的前几行不是很有用,那么可能dput(ds[c(1,3,5,7),1:3])
(其中第一个vec是有趣的行,1:3
是我们只需要几列)。我同意@r2evans。我在模拟的数据帧上测试了你的代码,效果很好。我们需要了解一下您的数据,以了解发生了什么,并在我们的系统上进行复制。请将其添加到您的问题中。结构(list(X39=c(50L,53L,37L,52L),State.gov=c(“Self emp not inc”,“Private”,“Private”,“Self emp not inc”),X77516=c(83311L,234721L,284582L,209642L)),row.names=c(1L,3L,5L,7L),class=“data.frame”)是我使用dput(ds[c(1,3,5,7),1:3]时的输出我没有在数据中看到工人阶级,但是。。。那里的所有字符串都有前导空格。您可以使用ischr之类的工具全局修复它。我已经使用colnames()创建了workclass作为自己的头文件
structure(list(age = c(50L, 38L, 53L, 28L, 37L, 49L), workclass = c(" Self-emp-not-inc",
" Private", " Private", " Private", " Private", " Private"),
responsenum = c(83311L, 215646L, 234721L, 338409L, 284582L,
160187L), education = c(" Bachelors", " HS-grad", " 11th",
" Bachelors", " Masters", " 9th"), education_years = c(13L,
9L, 7L, 13L, 14L, 5L), marital_status = c(" Married-civ-spouse",
" Divorced", " Married-civ-spouse", " Married-civ-spouse",
" Married-civ-spouse", " Married-spouse-absent"), occupation = c(" Exec-managerial",
" Handlers-cleaners", " Handlers-cleaners", " Prof-specialty",
" Exec-managerial", " Other-service"), familyrole = c(" Husband",
" Not-in-family", " Husband", " Wife", " Wife", " Not-in-family"
), race = c(" White", " White", " Black", " Black", " White",
" Black"), sex = c(" Male", " Male", " Male", " Female",
" Female", " Female"), capital_gain = c(0L, 0L, 0L, 0L, 0L,
0L), capital_loss = c(0L, 0L, 0L, 0L, 0L, 0L), hours_per_week = c(13L,
40L, 40L, 40L, 40L, 16L), country = c(" United-States", " United-States",
" United-States", " Cuba", " United-States", " Jamaica"),
income = c(" <=50K", " <=50K", " <=50K", " <=50K", " <=50K",
" <=50K")), row.names = c(NA, 6L), class = "data.frame")