如何基于r中的有序向量替换列中的所有值

如何基于r中的有序向量替换列中的所有值,r,R,我试图用有序类别替换数据框列中的所有数值。这是一个虚拟数据帧: df <- data.frame(a = c(1:100), b = sample(c(0,20), size = 100, replace = TRUE), c = c(1:100)) 如果我对每个类别都使用mutate和ifelse,则该过程太长,并且无法保持educ\u fac中的顺序。我一步就尝试了几种方法,但没有成功。 一种方法是: gss_df %>% mutate(educ = fct_recode

我试图用有序类别替换数据框列中的所有数值。这是一个虚拟数据帧:

df <- data.frame(a = c(1:100), b = sample(c(0,20), size = 100, replace = TRUE), c = c(1:100))
如果我对每个类别都使用
mutate
ifelse
,则该过程太长,并且无法保持
educ\u fac
中的顺序。我一步就尝试了几种方法,但没有成功。 一种方法是:

gss_df %>% 
  mutate(educ = fct_recode(educ, 
                           "No formal schooling" = 0, 
                           "1st grade" = 1, 
                           "2nd grade" = 2, 
                           "3rd grade" = 3, 
                           "4th grade" = 4, 
                           "5th grade" = 5, 
                           "6th grade" = 6, 
                           "7th grade" = 7, 
                           "8th grade" = 8, 
                           "9th grade" = 9, 
                           "10th grade" = 10, 
                           "11th grade" = 11, 
                           "12th grade" = 12, 
                           "1 year of college" = 13, 
                           "2 years of college" = 14, 
                           "3 years of college" = 15, 
                           "4 years of college" = 16, 
                           "5 years of college" = 17, 
                           "6 years of college" = 18, 
                           "7 years of college" = 19, 
                           "8 years of college" = 20))

Error: `f` must be a factor (or character vector or numeric vector).
其他两种方法相似,但也没有成功:

gss_df %>% 
  mutate(educ = fct_recode(educ, educ_fac))

Error: `f` must be a factor (or character vector or numeric vector).

有人能给出一个解决方案吗?

由于某些原因,我无法读取dta文件,因此下面我模拟数据向您展示我的建议。你从你的教育向量开始

educ_vec <- c("No formal schooling", "1st grade", 
"2nd grade", "3rd grade", "4th grade", "5th grade", 
"6th grade", "7th grade", "8th grade", "9th grade", 
"10th grade", "11th grade", "12th grade", "1 year of college", 
"2 years of college", "3 years of college", "4 years of college", 
"5 years of college", "6 years of college", "7 years of college", 
"8 years of college")
如果你的分数是i,新的分类值将是educ_vec[i+1];因此,我们可以利用以下信息:

set.seed(100)
gss_df <- data.frame(educ=sample(0:20,30,replace=TRUE))
gss_df %>% 
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

   educ                new
1     9          9th grade
2     5          5th grade
3    15 3 years of college
4    18 6 years of college
5    13  1 year of college
6    11         11th grade
7     5          5th grade
8     3          3rd grade
9     5          5th grade
10    1          1st grade
11    6          6th grade
12    6          6th grade
13   10         10th grade
14   17 5 years of college
15   11         11th grade
16    2          2nd grade
17   18 6 years of college
18    7          7th grade
19   17 5 years of college
20    1          1st grade
21   18 6 years of college
22    3          3rd grade
23    3          3rd grade
24   19 7 years of college
25   15 3 years of college
26   20 8 years of college
27    6          6th grade
28   15 3 years of college
29   10         10th grade
30   19 7 years of college
set.seed(100)
gss_df%
变异(新=因子(教育程度向量[educ+1],有序=真,级别=教育程度向量))
新教育
19九年级
2.5五年级
3、15、3年的大学生活
4 18 6年的大学生活
5 13大学一年
11年级
7.5五年级
8.3三年级
9.5五年级
10.1一年级
11.6六年级
12.6六年级
10年级
大学14年17年5年
11年级
16.2二年级
17 18 6年的大学生活
18 7七年级
19 17 5年的大学生活
20 1一年级
21 18 6年大学生活
22.3三年级
23 3三年级
24 19 7年的大学生活
25、15、3年大学生活
26 20 8年大学生活
276六年级
28 15 3年大学生活
10年级
30 19 7年的大学生活
是的,如果在数据中找不到某些因素,它就会起作用:

gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

  educ                 new
1    0 No formal schooling
2    1           1st grade
3    2           2nd grade
4    3           3rd grade
5    4           4th grade
6    5           5th grade
gss\u df%
变异(新=因子(教育程度向量[educ+1],有序=真,级别=教育程度向量))
新教育
10没有正规教育
2.1一年级
3.2二年级
4.3三年级
5.4四年级
6.5五年级
您可以看到,新列是具有预期类别的因素

str(gss_df)
'data.frame':   6 obs. of  2 variables:
 $ educ: int  0 1 2 3 4 5
 $ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6
str(gss\u-df)
“data.frame”:6个obs。共有2个变量:
$educ:int 0 1 2 3 4 5

$new:Ord.factor w/21级别“无正规学校教育”解决此问题的另一种方法是使用命名向量,然后进行因子排序。一旦您将
.dta
文件读取到您的工作区,有几种方法可以解决此问题

set.seed(777)
library(tidyverse)
df <- data.frame(a = c(1:100), b = sample(c(0:20), size = 100, replace = TRUE), c = c(1:100))

# -------------------------------------------------------------------------
head(df)
#   a  b c
# 1 1  0 1
# 2 2 18 2
# 3 3 11 3
# 4 4  9 4
# 5 5 11 5
# 6 6  8 6

# -------------------------------------------------------------------------

# this will be used as name istead
educ_vec <- c("No formal schooling", "1st grade", "2nd grade", "3rd grade", "4th grade", "5th grade", "6th grade", "7th grade", "8th grade", "9th grade", "10th grade", "11th grade", "12th grade", "1 year of college", "2 years of college", "3 years of college", "4 years of college", "5 years of college", "6 years of college", "7 years of college", "8 years of college")

# alues as char from 0 to 20
value_vec <- as.character(seq(21)-1)

# assign educ_vec as names 
names(value_vec) <- educ_vec

# fct_recode b
df$educ <- fct_recode(factor(df$b), !!!value_vec)

# set educ as ordered factor using educ_vec as levels
df$educ <- factor(df$educ, ordered = TRUE, levels = educ_vec)

# -------------------------------------------------------------------------
head(df)
#   a  b c                educ
# 1 1  0 1 No formal schooling
# 2 2 18 2  6 years of college
# 3 3 11 3          11th grade
# 4 4  9 4           9th grade
# 5 5 11 5          11th grade
# 6 6  8 6           8th grade

# -------------------------------------------------------------------------


set.seed(777)
图书馆(tidyverse)

df它可以工作,即使我用
educ
替换
new
,以避免添加新列。我可以知道这部分的意思吗?此外,如果在专栏中找不到
educ_vec
中的一些因素,它还会起作用吗?嗨@EricAtani,好的,我在更新的答案中做了更多解释。如果你清楚的话?我明白了。这是因为
educ
中的值从0到20不等,我们需要添加1,以便它们可以匹配
educ\u vec
中的值,对吗?第二步和第三步是如何工作的?我以为
value\u vec
是一个字符向量,数字是如何变化的?还有什么是
这里是什么意思?(对不起,我是r新手)你说得对,
value\u vec
是一个命名字符。在每一步打印输出可能有助于理解每一步在做什么
educ\u vec
用作
value\u vec
的名称,您可以使用
names(value\u vec)
查看该名称。关于三重爆炸,这是值得检查的。
gss_df <- data.frame(educ=0:5)%>%
mutate(new=factor(educ_vec[educ+1],ordered = TRUE, levels = educ_vec))

  educ                 new
1    0 No formal schooling
2    1           1st grade
3    2           2nd grade
4    3           3rd grade
5    4           4th grade
6    5           5th grade
str(gss_df)
'data.frame':   6 obs. of  2 variables:
 $ educ: int  0 1 2 3 4 5
 $ new : Ord.factor w/ 21 levels "No formal schooling"<..: 1 2 3 4 5 6
names(educ_vec) = 0:20
gss_df <- data.frame(educ=c(-1,0,20,21))
# you can also use mutate
gss_df$new <- educ_vec[match(gss_df$educ,names(educ_vec))]
gss_df

  educ                 new
1   -1                <NA>
2    0 No formal schooling
3   20  8 years of college
4   21                <NA>
set.seed(777)
library(tidyverse)
df <- data.frame(a = c(1:100), b = sample(c(0:20), size = 100, replace = TRUE), c = c(1:100))

# -------------------------------------------------------------------------
head(df)
#   a  b c
# 1 1  0 1
# 2 2 18 2
# 3 3 11 3
# 4 4  9 4
# 5 5 11 5
# 6 6  8 6

# -------------------------------------------------------------------------

# this will be used as name istead
educ_vec <- c("No formal schooling", "1st grade", "2nd grade", "3rd grade", "4th grade", "5th grade", "6th grade", "7th grade", "8th grade", "9th grade", "10th grade", "11th grade", "12th grade", "1 year of college", "2 years of college", "3 years of college", "4 years of college", "5 years of college", "6 years of college", "7 years of college", "8 years of college")

# alues as char from 0 to 20
value_vec <- as.character(seq(21)-1)

# assign educ_vec as names 
names(value_vec) <- educ_vec

# fct_recode b
df$educ <- fct_recode(factor(df$b), !!!value_vec)

# set educ as ordered factor using educ_vec as levels
df$educ <- factor(df$educ, ordered = TRUE, levels = educ_vec)

# -------------------------------------------------------------------------
head(df)
#   a  b c                educ
# 1 1  0 1 No formal schooling
# 2 2 18 2  6 years of college
# 3 3 11 3          11th grade
# 4 4  9 4           9th grade
# 5 5 11 5          11th grade
# 6 6  8 6           8th grade

# -------------------------------------------------------------------------