R 将一列拆分为多列,然后收集结果的更好方法?

R 将一列拆分为多列,然后收集结果的更好方法?,r,tidyr,magrittr,R,Tidyr,Magrittr,我有一个如下所示的数据框: message.id,sender,recipients 1,A,B|C 2,A,B 3,B,C|D|Q 我想拆分|上的收件人列,然后收集结果以生成以下内容: message.id,sender,recipient 1,A,B 1,A,C 2,A,B 3,B,C 3,B,D 3,B,Q 完成这种操作的更清晰的方法是什么?以下是我当前的代码: library(dplyr) library(stringr) library(tidyr) df <- data.

我有一个如下所示的数据框:

message.id,sender,recipients
1,A,B|C
2,A,B
3,B,C|D|Q
我想拆分|上的收件人列,然后收集结果以生成以下内容:

message.id,sender,recipient
1,A,B
1,A,C
2,A,B
3,B,C
3,B,D
3,B,Q
完成这种操作的更清晰的方法是什么?以下是我当前的代码:

library(dplyr)
library(stringr)
library(tidyr)

df <- data.frame(message.id = c(1,2,3),
                 sender = c("A","A","B"),
                 recipients = c("B|C","B","C|D|Q"))

max.splits = df$recipients %>% str_count("\\|") %>% max + 1

df %>% separate(recipients,1:max.splits, sep = "\\|") %>%
  gather(trash,recipient,-message.id,-sender) %>%
  select(message.id, sender, recipient) %>%
  filter(recipient %>% is.na == FALSE) %>%
  arrange(message.id)
我们可以使用data.table

我们可以使用data.table

这个怎么样,用plyr

这个怎么样,用plyr


我有偏见,但我建议从我的splitstackshape软件包中选择cSplit

用法简单地说就是:

library(splitstackshape)
cSplit(df, "recipients", "|", "long")
#    message.id sender recipients
# 1:          1      A          B
# 2:          1      A          C
# 3:          2      A          B
# 4:          3      B          C
# 5:          3      B          D
# 6:          3      B          Q
或者,将dplyr用于管道,tidyr用于unnest,然后您可以尝试:

library(dplyr)
library(tidyr)
df %>%
  mutate(recipients = as.character(recipients)) %>%         ## need character for strsplit
  mutate(recipients = strsplit(recipients, "|", TRUE)) %>%  ## Use `fixed = TRUE`
  unnest(recipients)                                        ## `unnest` goes to long form
# Source: local data frame [6 x 3]
# 
#   message.id sender recipients
#        (dbl) (fctr)      (chr)
# 1          1      A          B
# 2          1      A          C
# 3          2      A          B
# 4          3      B          C
# 5          3      B          D
# 6          3      B          Q

我有偏见,但我建议从我的splitstackshape软件包中选择cSplit

用法简单地说就是:

library(splitstackshape)
cSplit(df, "recipients", "|", "long")
#    message.id sender recipients
# 1:          1      A          B
# 2:          1      A          C
# 3:          2      A          B
# 4:          3      B          C
# 5:          3      B          D
# 6:          3      B          Q
或者,将dplyr用于管道,tidyr用于unnest,然后您可以尝试:

library(dplyr)
library(tidyr)
df %>%
  mutate(recipients = as.character(recipients)) %>%         ## need character for strsplit
  mutate(recipients = strsplit(recipients, "|", TRUE)) %>%  ## Use `fixed = TRUE`
  unnest(recipients)                                        ## `unnest` goes to long form
# Source: local data frame [6 x 3]
# 
#   message.id sender recipients
#        (dbl) (fctr)      (chr)
# 1          1      A          B
# 2          1      A          C
# 3          2      A          B
# 4          3      B          C
# 5          3      B          D
# 6          3      B          Q
下面是一个使用dplyr和tidyr的解决方案

代码

结果

  message.id sender recipient
1          1      A         B
2          2      A         B
3          3      B         C
4          1      A         C
5          3      B         D
6          3      B         Q
下面是一个使用dplyr和tidyr的解决方案

代码

结果

  message.id sender recipient
1          1      A         B
2          2      A         B
3          3      B         C
4          1      A         C
5          3      B         D
6          3      B         Q

库形;cSplitdf,recipients,|,long,但我有偏见。但您可能正在寻找类似df%>%mutaterecipients=strsplitas.characterrecipients,\\\\\\\\;%%>%unsendespecifients…librarysplitstackshape;cSplitdf,recipients,|,long,但我有偏见。但您可能正在寻找类似df%>%mutaterecipients=strsplitas.characterrecipients,\\\\\\\\\\%%>%unsendespecifients。。。。
df %>% separate(recipients,into =c("r1","r2","r3")) %>% 
gather("sen","recipient",r1:r3) %>% select(-sen) %>% 
filter(!is.na(recipient))
  message.id sender recipient
1          1      A         B
2          2      A         B
3          3      B         C
4          1      A         C
5          3      B         D
6          3      B         Q