tidyverse使用两种不同的重塑策略(创建分类列和二进制列)进行透视
使用以下数据:tidyverse使用两种不同的重塑策略(创建分类列和二进制列)进行透视,r,pivot,tidyverse,R,Pivot,Tidyverse,使用以下数据: df <- data.frame(id = c("A", "B", "C", "A", "B", "A"), value = c(1, 2, 3, 4, 5, 6)) 预期产出: # A tibble: 3 x 10 id cat_1 cat_2 cat_3 bin_1 bin_2 bin_3 bin_4 b
df <- data.frame(id = c("A", "B", "C", "A", "B", "A"),
value = c(1, 2, 3, 4, 5, 6))
预期产出:
# A tibble: 3 x 10
id cat_1 cat_2 cat_3 bin_1 bin_2 bin_3 bin_4 bin_5 bin_6
<chr> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int>
1 A 1 4 6 1 0 0 1 0 1
2 B 2 5 NA 0 1 0 0 1 0
3 C 3 NA NA 0 0 1 0 0 0
#一个tible:3 x 10
id cat_1 cat_2 cat_3 bin_1 bin_2 bin_3 bin_4 bin_5 bin_6
1A14610101
2B25NA01010
3c3na0100
在base中,您可以尝试:
tt <- unstack(df[2:1])
x <- cbind(t(sapply(tt, "[", seq_len(max(lengths(tt))))),
t(+sapply(names(tt), "%in%", x=df$id)))
colnames(x) <- c(paste0("cat_", seq_len(max(lengths(tt)))),
paste0("bin_", seq_len(nrow(df))))
x
# cat_1 cat_2 cat_3 bin_1 bin_2 bin_3 bin_4 bin_5 bin_6
#A 1 4 6 1 0 0 1 0 1
#B 2 5 NA 0 1 0 0 1 0
#C 3 NA NA 0 0 1 0 0 0
tt通过添加purrr
,您可以:
map(.x = reduce(range(df$value), `:`),
~ df %>%
group_by(id) %>%
mutate(!!paste0("bin_", .x) := as.numeric(.x %in% value))) %>%
reduce(full_join) %>%
mutate(cats = paste0("cat_", row_number())) %>%
pivot_wider(names_from = "cats",
values_from = "value")
id bin_1 bin_2 bin_3 bin_4 bin_5 bin_6 cat_1 cat_2 cat_3
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 1 0 0 1 0 1 1 4 6
2 B 0 1 0 0 1 0 2 5 NA
3 C 0 0 1 0 0 0 3 NA NA
map(.x=reduce(范围(df$值),`:`),
~df%>%
分组依据(id)%>%
mutate(!!paste0(“bin_”,.x):=as.numeric(.x%in%value)))%>%
减少(完全联接)%>%
突变(cats=paste0(“cat”,row_number())%>%
枢轴_加宽(name_from=“cats”,
值\u from=“value”)
id bin_1 bin_2 bin_3 bin_4 bin_5 bin_6 cat_1 cat_2 cat_3
1A1001014146
2B01001025NA
3C0010303NA
通过减少df2
代码,并利用列表和
技巧将其全部包含在一个管道中,从而稍微修改您的方法,该技巧允许您在同一调用中处理两个版本的df
这与你所做的相比并没有太大的改进,但现在它是一个电话。我想不出没有merge/join
的方法
library(tidyverse)
df %>%
list(
pivot_wider(., id_cols = id,
names_from = value,
names_prefix = "bin_") %>%
mutate_if(is.numeric, ~ +(!is.na(.))), #convert to binary
group_by(., id) %>%
mutate(group_id = 1:n()) %>%
ungroup() %>%
pivot_wider(names_from = group_id,
names_prefix = "cat_",
values_from = value)
) %>%
.[c(2:3)] %>%
reduce(left_join)
# id bin_1 bin_2 bin_3 bin_4 bin_5 bin_6 cat_1 cat_2 cat_3
# <chr> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
# 1 A 1 0 0 1 0 1 1 4 6
# 2 B 0 1 0 0 1 0 2 5 NA
# 3 C 0 0 1 0 0 0 3 NA NA
库(tidyverse)
df%>%
名单(
枢轴较宽(,id\u cols=id,
name_from=value,
名称\u prefix=“bin”)%%>%
如果(is.numeric,~+(!is.na)()),#转换为二进制
分组依据(%,id)%>%
变异(组id=1:n())%>%
解组()%>%
pivot\u更宽(name\u from=组\u id,
名称\u prefix=“cat”,
值(从=值)
) %>%
[c(2:3)]%>%
减少(左联合)
#id bin_1 bin_2 bin_3 bin_4 bin_5 bin_6 cat_1 cat_2 cat_3
#
#1A1001014146
#2B01001025NA
#3C0010303NA
即使您可以将两种语法合并为一种语法,而无需创建任何中间对象
df %>%
group_by(id) %>%
mutate(group_id = row_number()) %>%
pivot_wider(names_from = group_id,
names_prefix = "cat_",
values_from = value) %>% left_join(df %>% mutate(dummy = 1) %>% arrange(value) %>% pivot_wider(names_from = value,
names_prefix = "bin_",
values_from = dummy,
values_fill = list(dummy = 0),
values_fn = list(dummy = length)), by = "id")
# A tibble: 3 x 10
# Groups: id [3]
id cat_1 cat_2 cat_3 bin_1 bin_2 bin_3 bin_4 bin_5 bin_6
<chr> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int>
1 A 1 4 6 1 0 0 1 0 1
2 B 2 5 NA 0 1 0 0 1 0
3 C 3 NA NA 0 0 1 0 0 0
df%>%
分组依据(id)%>%
变异(组id=行编号())%>%
pivot\u更宽(name\u from=组\u id,
名称\u prefix=“cat”,
值\u from=value)%%>%left\u join(df%%>%mutate(dummy=1)%%>%arrange(value)%%>%pivot\u wide(names\u from=value,
名称\u prefix=“bin”,
值_from=dummy,
值\u填充=列表(虚拟=0),
值_fn=list(dummy=length)),by=“id”)
#一个tibble:3x10
#组别:id[3]
id cat_1 cat_2 cat_3 bin_1 bin_2 bin_3 bin_4 bin_5 bin_6
1A14610101
2B25NA01010
3c3na0100
不太可能。链接的问题实际上只是关于创建二进制版本(我知道怎么做)。我的问题是如何在一个步骤中同时完成分类和二进制。这是一种有趣的方法,但看看它,它也可以归结为一个事实,某种形式的连接是潜入的。或者换句话说。仅仅使用枢轴可能无法实现我想要的。但我目前的方法肯定有一些改变。其中涉及到一些连接,但它的用法与您的完全不同。可能只用枢轴就能解决,但我不知道有这样的解决方法。谢谢。是的,严格来说,这将是一条管道。我可能不够精确:我想知道的是,我是否可以简单地做两个旋转,一个接一个。但我意识到这似乎是不可能的,也许这也是可能的。让我想想办法
df %>%
group_by(id) %>%
mutate(group_id = row_number()) %>%
pivot_wider(names_from = group_id,
names_prefix = "cat_",
values_from = value) %>% left_join(df %>% mutate(dummy = 1) %>% arrange(value) %>% pivot_wider(names_from = value,
names_prefix = "bin_",
values_from = dummy,
values_fill = list(dummy = 0),
values_fn = list(dummy = length)), by = "id")
# A tibble: 3 x 10
# Groups: id [3]
id cat_1 cat_2 cat_3 bin_1 bin_2 bin_3 bin_4 bin_5 bin_6
<chr> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int>
1 A 1 4 6 1 0 0 1 0 1
2 B 2 5 NA 0 1 0 0 1 0
3 C 3 NA NA 0 0 1 0 0 0