如何在tidyverse中为每组其他变量创建虚拟变量

如何在tidyverse中为每组其他变量创建虚拟变量,r,combinations,tidyverse,dummy-variable,R,Combinations,Tidyverse,Dummy Variable,我想要创建(虚拟)变量,以显示一个观察是否在一组观察中(由一个公共组_ID标识),并在该组中具有特定的特征组合。代码示例更清楚地说明了我的确切意思 我尝试了groupby和caret::dummyVars的组合,但没有成功。我的想法快用完了,任何帮助都将不胜感激 library(tidyverse) # Input data # please note: in my case each value of the column Role will appear only once per Gro

我想要创建(虚拟)变量,以显示一个观察是否在一组观察中(由一个公共组_ID标识),并在该组中具有特定的特征组合。代码示例更清楚地说明了我的确切意思

我尝试了groupby和caret::dummyVars的组合,但没有成功。我的想法快用完了,任何帮助都将不胜感激

library(tidyverse)

# Input data
# please note: in my case each value of the column Role will appear only once per Group_ID.

input_data <- tribble( ~Group_ID, ~Role, ~Income,
                        #--|--|----
                        1, "a", 3.6,
                        1, "b", 8.5,

                        2, "a", 7.6,
                        2, "c", 9.5,
                        2, "d", 9.7,

                        3, "a", 1.6,
                        3, "b", 4.5,
                        3, "c", 2.7,
                        3, "e", 7.7,

                        4, "b", 3.3,
                        4, "c", 6.2,
)

# desired output
output_data <- tribble( ~Group_ID, ~Role, ~Income, ~Role_A,  ~Role_B, ~Role_C, ~Role_D, ~Role_E, ~All_roles,
                        #--|--|----
                        1, "a", 3.6, 1, 1, 0, 0, 0, "ab",
                        1, "b", 8.5, 1, 1, 0, 0, 0, "ab",

                        2, "a", 7.6, 1, 0, 1, 1, 0, "acd",
                        2, "c", 9.5, 1, 0, 1, 1, 0, "acd",
                        2, "d", 9.7, 1, 0, 1, 1, 0, "acd",

                        3, "a", 1.6, 1, 1, 1, 0, 1, "abce",
                        3, "b", 4.5, 1, 1, 1, 0, 1, "abce",
                        3, "c", 2.7, 1, 1, 1, 0, 1, "abce",
                        3, "e", 7.7, 1, 1, 1, 0, 1, "abce",

                        4, "b", 3.3, 0, 1, 1, 0, 0, "bc",
                        4, "c", 6.2, 0, 1, 1, 0, 0, "bc"
)
库(tidyverse)
#输入数据
#请注意:在我的例子中,列角色的每个值在每个组ID中只显示一次。

使用
dplyr
cSplit\u e
splitstackshape
输入数据。对于每个
组ID
我们将
角色
粘贴在一起,然后使用
cSplit\u e
根据它们的存在和不存在将它们分隔成新的二进制值列

library(splitstackshape)
library(dplyr)

input_data %>%
    group_by(Group_ID) %>%
    mutate(new_role = paste(Role, collapse = "")) %>%
    ungroup() %>%
    cSplit_e("new_role", sep = "", type = "character", fill = 0)

#   Group_ID Role Income new_role new_role_a new_role_b new_role_c new_role_d new_role_e
#1         1    a    3.6       ab          1          1          0          0          0
#2         1    b    8.5       ab          1          1          0          0          0
#3         2    a    7.6      acd          1          0          1          1          0
#4         2    c    9.5      acd          1          0          1          1          0
#5         2    d    9.7      acd          1          0          1          1          0
#6         3    a    1.6     abce          1          1          1          0          1
#7         3    b    4.5     abce          1          1          1          0          1
#8         3    c    2.7     abce          1          1          1          0          1
#9         3    e    7.7     abce          1          1          1          0          1
#10        4    b    3.3       bc          0          1          1          0          0
#11        4    c    6.2       bc          0          1          1          0          0

下面利用基本R建模功能来创建假人

首先,创建一个没有截距的模型矩阵

fit <- lm(Group_ID ~ 0 + Role, input_data)
m <- model.matrix(fit)

非常感谢您的快速回答和对这个有用的包的提示,我以前从未听说过!!
input_data %>%
  bind_cols(m %>% as.data.frame()) %>%
  group_by(Group_ID) %>%
  mutate_at(vars(matches("Role[[:alpha:]]")), sum) %>%
  mutate(all_roles = paste(Role, collapse = ""))
## A tibble: 11 x 9
## Groups:   Group_ID [4]
#   Group_ID Role  Income Rolea Roleb Rolec Roled Rolee all_roles
#      <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>    
# 1        1 a        3.6     1     1     0     0     0 ab       
# 2        1 b        8.5     1     1     0     0     0 ab       
# 3        2 a        7.6     1     0     1     1     0 acd      
# 4        2 c        9.5     1     0     1     1     0 acd      
# 5        2 d        9.7     1     0     1     1     0 acd      
# 6        3 a        1.6     1     1     1     0     1 abce     
# 7        3 b        4.5     1     1     1     0     1 abce     
# 8        3 c        2.7     1     1     1     0     1 abce     
# 9        3 e        7.7     1     1     1     0     1 abce     
#10        4 b        3.3     0     1     1     0     0 bc       
#11        4 c        6.2     0     1     1     0     0 bc