如何基于r中的有序向量替换列中的所有值
我试图用有序类别替换数据框列中的所有数值。这是一个虚拟数据帧:如何基于r中的有序向量替换列中的所有值,r,R,我试图用有序类别替换数据框列中的所有数值。这是一个虚拟数据帧: df <- data.frame(a = c(1:100), b = sample(c(0,20), size = 100, replace = TRUE), c = c(1:100)) 如果我对每个类别都使用mutate和ifelse,则该过程太长,并且无法保持educ\u fac中的顺序。我一步就尝试了几种方法,但没有成功。 一种方法是: gss_df %>% mutate(educ = fct_recode
df <- data.frame(a = c(1:100), b = sample(c(0,20), size = 100, replace = TRUE), c = c(1:100))
如果我对每个类别都使用mutate
和ifelse
,则该过程太长,并且无法保持educ\u fac
中的顺序。我一步就尝试了几种方法,但没有成功。
一种方法是:
gss_df %>%
mutate(educ = fct_recode(educ,
"No formal schooling" = 0,
"1st grade" = 1,
"2nd grade" = 2,
"3rd grade" = 3,
"4th grade" = 4,
"5th grade" = 5,
"6th grade" = 6,
"7th grade" = 7,
"8th grade" = 8,
"9th grade" = 9,
"10th grade" = 10,
"11th grade" = 11,
"12th grade" = 12,
"1 year of college" = 13,
"2 years of college" = 14,
"3 years of college" = 15,
"4 years of college" = 16,
"5 years of college" = 17,
"6 years of college" = 18,
"7 years of college" = 19,
"8 years of college" = 20))
Error: `f` must be a factor (or character vector or numeric vector).
其他两种方法相似,但也没有成功:
gss_df %>%
mutate(educ = fct_recode(educ, educ_fac))
Error: `f` must be a factor (or character vector or numeric vector).
有人能给出一个解决方案吗?由于某些原因,我无法读取dta文件,因此下面我模拟数据向您展示我的建议。你从你的教育向量开始
educ_vec <- c("No formal schooling", "1st grade",
"2nd grade", "3rd grade", "4th grade", "5th grade",
"6th grade", "7th grade", "8th grade", "9th grade",
"10th grade", "11th grade", "12th grade", "1 year of college",
"2 years of college", "3 years of college", "4 years of college",
"5 years of college", "6 years of college", "7 years of college",
"8 years of college")
如果你的分数是i,新的分类值将是educ_vec[i+1];因此,我们可以利用以下信息:
set.seed(100)
gss_df <- data.frame(educ=sample(0:20,30,replace=TRUE))
gss_df %>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))
educ new
1 9 9th grade
2 5 5th grade
3 15 3 years of college
4 18 6 years of college
5 13 1 year of college
6 11 11th grade
7 5 5th grade
8 3 3rd grade
9 5 5th grade
10 1 1st grade
11 6 6th grade
12 6 6th grade
13 10 10th grade
14 17 5 years of college
15 11 11th grade
16 2 2nd grade
17 18 6 years of college
18 7 7th grade
19 17 5 years of college
20 1 1st grade
21 18 6 years of college
22 3 3rd grade
23 3 3rd grade
24 19 7 years of college
25 15 3 years of college
26 20 8 years of college
27 6 6th grade
28 15 3 years of college
29 10 10th grade
30 19 7 years of college
set.seed(100)
gss_df%
变异(新=因子(教育程度向量[educ+1],有序=真,级别=教育程度向量))
新教育
19九年级
2.5五年级
3、15、3年的大学生活
4 18 6年的大学生活
5 13大学一年
11年级
7.5五年级
8.3三年级
9.5五年级
10.1一年级
11.6六年级
12.6六年级
10年级
大学14年17年5年
11年级
16.2二年级
17 18 6年的大学生活
18 7七年级
19 17 5年的大学生活
20 1一年级
21 18 6年大学生活
22.3三年级
23 3三年级
24 19 7年的大学生活
25、15、3年大学生活
26 20 8年大学生活
276六年级
28 15 3年大学生活
10年级
30 19 7年的大学生活
是的,如果在数据中找不到某些因素,它就会起作用:
gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))
educ new
1 0 No formal schooling
2 1 1st grade
3 2 2nd grade
4 3 3rd grade
5 4 4th grade
6 5 5th grade
gss\u df%
变异(新=因子(教育程度向量[educ+1],有序=真,级别=教育程度向量))
新教育
10没有正规教育
2.1一年级
3.2二年级
4.3三年级
5.4四年级
6.5五年级
您可以看到,新列是具有预期类别的因素
str(gss_df)
'data.frame': 6 obs. of 2 variables:
$ educ: int 0 1 2 3 4 5
$ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6
str(gss\u-df)
“data.frame”:6个obs。共有2个变量:
$educ:int 0 1 2 3 4 5
$new:Ord.factor w/21级别“无正规学校教育”解决此问题的另一种方法是使用命名向量,然后进行因子排序。一旦您将.dta
文件读取到您的工作区,有几种方法可以解决此问题
set.seed(777)
library(tidyverse)
df <- data.frame(a = c(1:100), b = sample(c(0:20), size = 100, replace = TRUE), c = c(1:100))
# -------------------------------------------------------------------------
head(df)
# a b c
# 1 1 0 1
# 2 2 18 2
# 3 3 11 3
# 4 4 9 4
# 5 5 11 5
# 6 6 8 6
# -------------------------------------------------------------------------
# this will be used as name istead
educ_vec <- c("No formal schooling", "1st grade", "2nd grade", "3rd grade", "4th grade", "5th grade", "6th grade", "7th grade", "8th grade", "9th grade", "10th grade", "11th grade", "12th grade", "1 year of college", "2 years of college", "3 years of college", "4 years of college", "5 years of college", "6 years of college", "7 years of college", "8 years of college")
# alues as char from 0 to 20
value_vec <- as.character(seq(21)-1)
# assign educ_vec as names
names(value_vec) <- educ_vec
# fct_recode b
df$educ <- fct_recode(factor(df$b), !!!value_vec)
# set educ as ordered factor using educ_vec as levels
df$educ <- factor(df$educ, ordered = TRUE, levels = educ_vec)
# -------------------------------------------------------------------------
head(df)
# a b c educ
# 1 1 0 1 No formal schooling
# 2 2 18 2 6 years of college
# 3 3 11 3 11th grade
# 4 4 9 4 9th grade
# 5 5 11 5 11th grade
# 6 6 8 6 8th grade
# -------------------------------------------------------------------------
set.seed(777)
图书馆(tidyverse)
df它可以工作,即使我用educ
替换new
,以避免添加新列。我可以知道这部分的意思吗?此外,如果在专栏中找不到educ_vec
中的一些因素,它还会起作用吗?嗨@EricAtani,好的,我在更新的答案中做了更多解释。如果你清楚的话?我明白了。这是因为educ
中的值从0到20不等,我们需要添加1,以便它们可以匹配educ\u vec
中的值,对吗?第二步和第三步是如何工作的?我以为value\u vec
是一个字符向量,数字是如何变化的?还有什么是代码>这里是什么意思?(对不起,我是r新手)你说得对,value\u vec
是一个命名字符。在每一步打印输出可能有助于理解每一步在做什么educ\u vec
用作value\u vec
的名称,您可以使用names(value\u vec)
查看该名称。关于三重爆炸,这是值得检查的。
gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))
educ new
1 0 No formal schooling
2 1 1st grade
3 2 2nd grade
4 3 3rd grade
5 4 4th grade
6 5 5th grade
str(gss_df)
'data.frame': 6 obs. of 2 variables:
$ educ: int 0 1 2 3 4 5
$ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6
names(educ_vec) = 0:20
gss_df <- data.frame(educ=c(-1,0,20,21))
# you can also use mutate
gss_df$new <- educ_vec[match(gss_df$educ,names(educ_vec))]
gss_df
educ new
1 -1 <NA>
2 0 No formal schooling
3 20 8 years of college
4 21 <NA>
set.seed(777)
library(tidyverse)
df <- data.frame(a = c(1:100), b = sample(c(0:20), size = 100, replace = TRUE), c = c(1:100))
# -------------------------------------------------------------------------
head(df)
# a b c
# 1 1 0 1
# 2 2 18 2
# 3 3 11 3
# 4 4 9 4
# 5 5 11 5
# 6 6 8 6
# -------------------------------------------------------------------------
# this will be used as name istead
educ_vec <- c("No formal schooling", "1st grade", "2nd grade", "3rd grade", "4th grade", "5th grade", "6th grade", "7th grade", "8th grade", "9th grade", "10th grade", "11th grade", "12th grade", "1 year of college", "2 years of college", "3 years of college", "4 years of college", "5 years of college", "6 years of college", "7 years of college", "8 years of college")
# alues as char from 0 to 20
value_vec <- as.character(seq(21)-1)
# assign educ_vec as names
names(value_vec) <- educ_vec
# fct_recode b
df$educ <- fct_recode(factor(df$b), !!!value_vec)
# set educ as ordered factor using educ_vec as levels
df$educ <- factor(df$educ, ordered = TRUE, levels = educ_vec)
# -------------------------------------------------------------------------
head(df)
# a b c educ
# 1 1 0 1 No formal schooling
# 2 2 18 2 6 years of college
# 3 3 11 3 11th grade
# 4 4 9 4 9th grade
# 5 5 11 5 11th grade
# 6 6 8 6 8th grade
# -------------------------------------------------------------------------