在R中如何在一个分类变量中组合两个级别
我现在正在学习R,我在查找命令时遇到问题 我有分类数据在R中如何在一个分类变量中组合两个级别,r,R,我现在正在学习R,我在查找命令时遇到问题 我有分类数据 levels(job) [1] "admin." "blue-collar" "entrepreneur" "housemaid" [5] "management" "retired" "self-employed" "services" [9] "student" "technician" "unemployed" "unknown" 现在我想简化这些级别,
levels(job)
[1] "admin." "blue-collar" "entrepreneur" "housemaid"
[5] "management" "retired" "self-employed" "services"
[9] "student" "technician" "unemployed" "unknown"
现在我想简化这些级别,例如
levels(job)
[1] "class1" "class2" "class3" "unknown"
其中类型1
包括“管理员”
,“企业家”
,以及“个体经营者”
;
类型2
包括“蓝领”
,“管理”
,以及“技术人员”
;
类型3
包括“客房服务员”
,“学生”
,“退休”
,以及“服务”
;
未知
包括未知
和失业
为此,我可以使用哪个命令?
谢谢
Yan从
汽车
软件包中尝试重新编码
功能
(作为答案而不是评论发布,如果其他人发布更好的答案,将被删除)另一种base-r解决方案:创建一个
字符
向量,更改其值,因子()
job <- as.character(job)
job[job %in% c("admin.","entrepreneur","self-employed")] <- "class1"
... # do the same for the other classes
job <- factor(job)
作业您可以分配到级别
:
levels(z)[levels(z)%in%c("unemployed","unknown","self-employed")] <- "unknown"
您还可以创建“键/值”索引向量,并使用它替换“作业”中的元素
indx <- setNames(rep(c(paste0('type',1:3), 'unknown'), c(3,3,4,2)),
c(levels(job)[c(1,3,7)], levels(job)[c(2,5,10)],
levels(job)[c(4,6,8,9)], levels(job)[c(11,12)]))
factor(unname(indx[as.character(job)]))
indx谢谢大家!它工作得很好。
indx <- setNames(rep(c(paste0('type',1:3), 'unknown'), c(3,3,4,2)),
c(levels(job)[c(1,3,7)], levels(job)[c(2,5,10)],
levels(job)[c(4,6,8,9)], levels(job)[c(11,12)]))
factor(unname(indx[as.character(job)]))
v1 <- c('admin.', 'blue-collar', 'entrepreneur', 'housemaid',
'management', 'retired', 'self-employed', 'services', 'student',
'technician', 'unemployed', 'unknown')
set.seed(24)
job <- factor(sample(v1, 50, replace=TRUE))