dplyr中的长格式二进制编码
我的数据包含在n个选项(本例中为3个)之间进行选择的受访者(本例中为10个)dplyr中的长格式二进制编码,r,dplyr,tidyverse,R,Dplyr,Tidyverse,我的数据包含在n个选项(本例中为3个)之间进行选择的受访者(本例中为10个) 有什么更好的方法吗?使用tidyr::complete从列中创建所有唯一值的组合(这里您需要RID和choice): df%>% mutate(selection=1)%>%#创建一个1的选择列 完成(RID,choice,fill=list(selection=0))#用0填充缺少的组合 #一个tibble:30x3 #RID选择 # # 1 1 1 1. # 2
有什么更好的方法吗?使用
tidyr::complete
从列中创建所有唯一值的组合(这里您需要RID
和choice
):
df%>%
mutate(selection=1)%>%#创建一个1的选择列
完成(RID,choice,fill=list(selection=0))#用0填充缺少的组合
#一个tibble:30x3
#RID选择
#
# 1 1 1 1.
# 2 1 2 0.
# 3 1 3 0.
# 4 2 1 0.
# 5 2 2 0.
# 6 2 3 1.
# 7 3 1 0.
# 8 3 2 0.
# 9 3 3 1.
#10 4 1 1.
# ... 还有20行
另一个选项可以使用展开.grid
作为:
#Create all possible combination using RID and unique choice
result <- expand.grid(RID = df$RID, choice = unique(df$choice))
#New column as 'select' will be 1 for those combination which were present in original df
result$selection = ifelse(result$RID == df$RID & result$choice == df$choice, 1, 0)
result
#1 1 2 1
#2 2 2 0
#3 3 2 0
#4 4 2 0
#5 5 2 0
#6 6 2 0
#7 7 2 0
#8 8 2 0
#9 9 2 1
#........
#........
#30 rows
#使用RID和unique choice创建所有可能的组合
结果
# desired output
df %>%
mutate(value = 1) %>%
spread(choice,value, fill=0) %>%
gather("choice","selection",2:4) %>%
arrange(RID,choice)
df %>%
mutate(selection = 1) %>% # create a selection column of 1
complete(RID, choice, fill = list(selection = 0)) # fill selection with 0 for missing combinations
# A tibble: 30 x 3
# RID choice selection
# <int> <int> <dbl>
# 1 1 1 1.
# 2 1 2 0.
# 3 1 3 0.
# 4 2 1 0.
# 5 2 2 0.
# 6 2 3 1.
# 7 3 1 0.
# 8 3 2 0.
# 9 3 3 1.
#10 4 1 1.
# ... with 20 more rows
#Create all possible combination using RID and unique choice
result <- expand.grid(RID = df$RID, choice = unique(df$choice))
#New column as 'select' will be 1 for those combination which were present in original df
result$selection = ifelse(result$RID == df$RID & result$choice == df$choice, 1, 0)
result
#1 1 2 1
#2 2 2 0
#3 3 2 0
#4 4 2 0
#5 5 2 0
#6 6 2 0
#7 7 2 0
#8 8 2 0
#9 9 2 1
#........
#........
#30 rows