R 将多个因素的级别重新编码到指定范围
我有以下数据框:R 将多个因素的级别重新编码到指定范围,r,dplyr,forcats,R,Dplyr,Forcats,我有以下数据框: library(tidyverse) df <- tibble(a = c(1, 2, 3, 4, 5), b = c("Y", "N", "N", "Y", "N"), c = c("A", "B", "C", "A", "B&qu
library(tidyverse)
df <- tibble(a = c(1, 2, 3, 4, 5),
b = c("Y", "N", "N", "Y", "N"),
c = c("A", "B", "C", "A", "B"))
df <- df %>%
mutate_if(is.character, funs(as.factor))
我想将所有因子(b
和c
变量)级别重新编码为整数:如果一个因子只有两个级别,则应将其重新编码为{0,1},否则应将其编码为{1,2,3,}级别。因此,输出应为:
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
abc
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
我可以单独(逐个)重新编码变量,但我想知道是否有更方便的方法。df%
df <- df %>%
mutate_if(
is.character,
function(x) {
out <- as.integer(as.factor(x))
if (n_distinct(out) == 2) out <- out - 1L
out
}
)
df
# a b c
# <dbl> <int> <int>
# 1 1 1 1
# 2 2 0 2
# 3 3 0 3
# 4 4 1 1
# 5 5 0 2
变异(
这是我的性格,
功能(x){
out这是否有效:
> library(dplyr)
> df %>% mutate(b_fac = match(b,unique(b)) - 1, c_fac = match(c, unique(c))) %>%
+ mutate(b_fac = ifelse(b_fac == 1, 0, 1)) %>% mutate(b_fac = as.factor(b_fac), c_fac = as.factor(c_fac)) %>%
+ select(-2,-3) %>% rename(b = b_fac, c = c_fac)
# A tibble: 5 x 3
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
>
>库(dplyr)
>df%>%突变(b_fac=match(b,unique(b))-1,c_fac=match(c,unique(c)))%>%
+突变(b_fac=ifelse(b_fac==1,0,1))%>%突变(b_fac=as.factor(b_fac),c_fac=as.factor(c_fac))%>%
+选择(-2,-3)%>%重命名(b=b\u fac,c=c\u fac)
#一个tibble:5x3
a、b、c
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
>
一个dplyr
选项可以是:
df %>%
mutate(across(where(is.factor),
~ if(n_distinct(.) == 2) factor(., labels = 0:1) else factor(., labels = 1:n_distinct(.))))
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
df%>%
突变(跨越(其中(是因子),
~if(n_distinct(.)==2)因子(,labels=0:1)else因子(,labels=1:n_distinct()
a、b、c
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
df %>%
mutate(across(where(is.factor),
~ if(n_distinct(.) == 2) factor(., labels = 0:1) else factor(., labels = 1:n_distinct(.))))
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2