R 将虚拟列反向转换为一个变量

R 将虚拟列反向转换为一个变量,r,dplyr,tidyr,R,Dplyr,Tidyr,例如: my_diamonds <- diamonds %>% fastDummies::dummy_cols(select_columns = "color", remove_selected_columns = T) my_diamonds %>% glimpse 看起来像这样: Observations: 53,940 Variables: 16 $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24

例如:

my_diamonds <- diamonds %>% fastDummies::dummy_cols(select_columns = "color", remove_selected_columns = T)
my_diamonds %>% glimpse
看起来像这样:

Observations: 53,940
Variables: 16
$ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0.30, 0.23, 0.22, 0.31, 0.20, 0.32, 0.30, 0.30, 0.30, 0.30, 0.30, 0.23, 0.2…
$ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Very Good, Fair, Very Good, Good, Ideal, Premium, Ideal, Premium, Premium, I…
$ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1, SI1, SI2, SI2, I1, SI2, SI1, SI1, SI1, SI2, VS2, VS1, SI1, SI1, VVS2, VS1…
$ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 64.0, 62.8, 60.4, 62.2, 60.2, 60.9, 62.0, 63.4, 63.8, 62.7, 63.3, 63.8, 61.…
$ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 58, 54, 54, 56, 59, 56, 55, 57, 62, 62, 58, 57, 57, 61, 57, 57, 57, 59, 58,…
$ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 342, 344, 345, 345, 348, 351, 351, 351, 351, 352, 353, 353, 353, 354, 355, …
$ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4.25, 3.93, 3.88, 4.35, 3.79, 4.38, 4.31, 4.23, 4.23, 4.21, 4.26, 3.85, 3.9…
$ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4.28, 3.90, 3.84, 4.37, 3.75, 4.42, 4.34, 4.29, 4.26, 4.27, 4.30, 3.92, 3.9…
$ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2.73, 2.46, 2.33, 2.71, 2.27, 2.68, 2.68, 2.70, 2.71, 2.66, 2.71, 2.48, 2.4…
$ color_D <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, …
$ color_E <int> 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
$ color_F <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ color_G <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ color_H <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, …
$ color_I <int> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, …
$ color_J <int> 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
是否有一种开箱即用的非自定义函数方法,可以通过一列“颜色”将my_diamonds恢复为原始形式?

您可以使用pivot_更长时间:

基本R选项:

cols <- sub("color_", "", grep("^color_", names(my_diamonds), value=TRUE)); cols
[1] "D" "E" "F" "G" "H" "I" "J"

my_diamonds$color <- cols[
     apply(my_diamonds[,grep("^color_", names(my_diamonds))], 1, which.max]

all(my_diamonds$color==diamonds$color)
#[1] TRUE
使用max.col的另一个选项:


通常,当您获取长格式的数据时,它会给出颜色列中的完整列名,即color\u D、color\u e等,但在这里,因为我们只对D、e感兴趣,所以我选择使用名称\u模式。捕获下划线'.*...*'后的所有内容
cols <- sub("color_", "", grep("^color_", names(my_diamonds), value=TRUE)); cols
[1] "D" "E" "F" "G" "H" "I" "J"

my_diamonds$color <- cols[
     apply(my_diamonds[,grep("^color_", names(my_diamonds))], 1, which.max]

all(my_diamonds$color==diamonds$color)
#[1] TRUE
my_diamonds$color <- cols[max.col(my_diamonds[,grep("^color_", names(my_diamonds))])]

all(my_diamonds$color == diamonds$color)
#[1] TRUE
col <- "color"
my_diamonds$color <- my_diamonds %>% 
    select(starts_with(col)) %>%
    {gsub(paste0(col,"_"), "", names(.))[max.col(.)]}