定义列中的spread（）_R - Fatal编程技术网

定义列中的spread（）

定义列中的spread（）,r,R,基本上，对于每个id，我都有一组产品id，我试图将它们分布在一组define列中。每个id只能有5个产品id。例：因此，我将其作为二元结果进行传播，如： id 305 402 200 1 2 0 0 2 0 0 1 3 0 2 0 但我想： id product1 product2 product3 product4... until 5 1 305 305 0 2 200 0

基本上，对于每个id，我都有一组产品id，我试图将它们分布在一组define列中。每个id只能有5个产品id。例：

因此，我将其作为二元结果进行传播，如：

id 305 402 200 
1   2   0   0
2   0   0   1
3   0   2   0

但我想：

id  product1  product2 product3 product4... until 5 
1      305      305       0
2      200      0         0
3      402      402       0

如果有人有干净的东西（我有大约10K排），那就太棒了！！谢谢

#this gives me the binary outcome
for (i in names(test2[2:18])) {
  test2$product1[test2[i] == 1 ] <- i
  }

#this is a try to iterate through each row but it s pretty bad

    for(i in 1:nrow(test2)){
  if(test2[i,1]== 1){

    test2$product1[i] <- colnames(test2[1])
  } else if(test2[i,1]==2){

    test2$product1[i] <- colnames(test2[1])
    test2$product2[i] <- colnames(test2[1])
  } else if(test2[i,1]==3){

    test2$product1[i] <- colnames(test2[1])
    test2$product2[i] <- colnames(test2[1])
    test2$product3[i] <- colnames(test2[1])
  } else if(test2[i,1]==4){

and so one...

实际：

id 305 402 200 
1   2   0   0
2   0   0   1
3   0   2   0

我们可以通过“id”创建一个序列列，然后

spread

。请注意，仅

spread

ing在5之前不会拥有所有的“产品”，因为数据中缺少这些产品。为此，将序列创建为

因子

，从'product1'到'product5'指定

级别

，并在

排列

中，指定

drop=FALSE

不删除未使用的

级别

library(tidyverse)
df1 %>% 
   group_by(id) %>%
   mutate(product = factor(paste0('product', row_number()), 
             levels = paste0('product', 1:5))) %>% 
   spread(product, product_id, drop = FALSE, fill = 0)
# A tibble: 3 x 6
# Groups:   id [3]
#     id product1 product2 product3 product4 product5    
#  <int>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1     1      305      402      305        0        0
#2     2      200        0        0        0        0
#3     3      402      402        0        0        0

库（tidyverse）
df1%>%
分组依据（id）%>%
突变（乘积=因子（粘贴0（'product'，行号（）），
级别=粘贴0（'产品'，1:5））%>%
排列（产品、产品标识、下降=假、填充=0）
#一个tibble:3x6
#组别:id[3]
#id product1 product2 product3 product4 product5
#                      
#1     1      305      402      305        0        0
#2     2      200        0        0        0        0
#3     3      402      402        0        0        0

数据

df1
id 305 402 200 
1   2   0   0
2   0   0   1
3   0   2   0

library(tidyverse)
df1 %>% 
   group_by(id) %>%
   mutate(product = factor(paste0('product', row_number()), 
             levels = paste0('product', 1:5))) %>% 
   spread(product, product_id, drop = FALSE, fill = 0)
# A tibble: 3 x 6
# Groups:   id [3]
#     id product1 product2 product3 product4 product5    
#  <int>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1     1      305      402      305        0        0
#2     2      200        0        0        0        0
#3     3      402      402        0        0        0

df1 <- structure(list(id = c(1L, 1L, 2L, 1L, 3L, 3L), product_id = c(305L, 
 402L, 200L, 305L, 402L, 402L)), class = "data.frame", row.names = c(NA, 
 -6L))