R 通过分隔项将列表类型列转换为长格式

R 通过分隔项将列表类型列转换为长格式,r,dataframe,split,R,Dataframe,Split,我有一个表,其中有两列感兴趣的内容,如下所示: 状态|id|标签 947306525726527488 |新年七部919 947306316959281153 | MakeItALifestyle 947306315952611330 | c(“Ejuice”、“vape”、“vaping”) 947306265520328704 | c(“vapefam”、“vapenation”、“vapefamily”) 947305941522771968 |正在播放 数据 structure(list

我有一个表,其中有两列感兴趣的内容,如下所示:

状态|id|标签
947306525726527488 |新年七部919
947306316959281153 | MakeItALifestyle
947306315952611330 | c(“Ejuice”、“vape”、“vaping”)
947306265520328704 | c(“vapefam”、“vapenation”、“vapefamily”)
947305941522771968 |正在播放

数据

structure(list(status_id = c("947306525726527488", "947306316959281153", 
"947306315952611330", "947306265520328704", "947305941522771968"
), hashtags = list("NEWYEARSEVEPARTY919", "MakeItALifestyle", 
    c("Ejuice", "vape", "vaping", "eliquid", "ecigjuice", "ecig", 
    "vapejuice"), c("vapefam", "vapenation", "vapefamily", "vapelife", 
    "vapelyfe", "vapeon", "positivity"), "nowplaying")), .Names = c("status_id", 
"hashtags"), row.names = c(NA, -5L), class = c("tbl_df", "tbl", 
"data.frame"))
预期结果

我想要以下两个表(当然,在实际的原始df中,我删除了更多的列,因为它们与问题无关):

df1
状态\u id
947306525726527488
947306316959281153
947306315952611330
947306265520328704
947305941522771968

df2
状态|id|标签
947306525726527488 |新年七部919
947306316959281153 | MakeItALifestyle
947306315952611330 | Ejuice
947306315952611330 | vape
947306315952611330 |抽气
947306265520328704 |瓦佩法姆
947306265520328704 |惩罚
947306265520328704 |瓦佩家族
947305941522771968 |正在播放

原始数据每个status_id有一行,所有hashtag>1都是c(…)——分类为type:“list”。df2将各个hashtag分隔成单独的行

虽然我以前从未遇到过列表类型的列,但在谷歌上搜索它让我在将列表转换为列而不是“list”类型的列(data.table)时学到了很多东西 图书馆(dplyr) rm(list=ls())
这里有一个可能的解决方案。我调用了您的数据
mydf
。您在
hashtags
中有列表。您可以使用
unlist()
paste()
hashtags
中的每一行创建一个向量。如果需要,可以使用
toSting()
而不是
paste()
。一旦在
hashtags
中有一个向量,就要将其拆分。具体来说,对于第3行和第4行,您有多个hashtag。你想把它们分开。您可以使用
splitstackshape
包中的
cSplit()
。结果就是您想要的
df2
。一旦有了它,就要创建
df1
。选择
status\u id
并查找唯一的
status\u id

library(dplyr)
library(splitstackshape)

df2 <- mydf %>%
       rowwise %>%
       mutate(hashtags = paste(unlist(hashtags), collapse = ",")) %>%
       cSplit(splitCols = "hashtags", sep = ",", direction = "long")

             status_id            hashtags
 1: 947306525726527488 NEWYEARSEVEPARTY919
 2: 947306316959281153    MakeItALifestyle
 3: 947306315952611330              Ejuice
 4: 947306315952611330                vape
 5: 947306315952611330              vaping
 6: 947306315952611330             eliquid
 7: 947306315952611330           ecigjuice
 8: 947306315952611330                ecig
 9: 947306315952611330           vapejuice
10: 947306265520328704             vapefam
11: 947306265520328704          vapenation
12: 947306265520328704          vapefamily
13: 947306265520328704            vapelife
14: 947306265520328704            vapelyfe
15: 947306265520328704              vapeon
16: 947306265520328704          positivity
17: 947305941522771968          nowplaying

df1 <- unique(df2[, 1, with = FALSE])

            status_id
1: 947306525726527488
2: 947306316959281153
3: 947306315952611330
4: 947306265520328704
5: 947305941522771968

为了完整起见,这里还有一个
数据表
解决方案:

library(data.table)
df2 <- setDT(juice)[, .(hashtag = unlist(hashtags)), by = status_id]
df1 <- unique(juice[, .(status_id)])

df2

也许像这样的
df1-original是str()代码上面最上面的表。df1和df2是理想的结果原始数据和
df2
之间的区别是什么?原始数据每个状态id有一行,所有哈希标记为c(…)-分类为类型:“列表”。df2将单独的hashtag分隔为单独的行,并将其覆盖。因此,对于
df1:-df1工作得非常完美-请将此作为最佳答案simplest@SaleemKhan很高兴能帮助你。:)@SaleemKhan,既然你已经在使用“tidyverse”,你就不能只做
unest(juice)
?或者,使用“splitstackshape”
listCol_l(juice,“hashtags”)[]
:-)@A5C1D2H2I1M1N2O1R2T1我明白了。帽子很吸引人,不是吗?:)如果我能帮忙的话,我很乐意帮你更新。很好的想法可以找到解决方案。
library(dplyr)
library(splitstackshape)

df2 <- mydf %>%
       rowwise %>%
       mutate(hashtags = paste(unlist(hashtags), collapse = ",")) %>%
       cSplit(splitCols = "hashtags", sep = ",", direction = "long")

             status_id            hashtags
 1: 947306525726527488 NEWYEARSEVEPARTY919
 2: 947306316959281153    MakeItALifestyle
 3: 947306315952611330              Ejuice
 4: 947306315952611330                vape
 5: 947306315952611330              vaping
 6: 947306315952611330             eliquid
 7: 947306315952611330           ecigjuice
 8: 947306315952611330                ecig
 9: 947306315952611330           vapejuice
10: 947306265520328704             vapefam
11: 947306265520328704          vapenation
12: 947306265520328704          vapefamily
13: 947306265520328704            vapelife
14: 947306265520328704            vapelyfe
15: 947306265520328704              vapeon
16: 947306265520328704          positivity
17: 947305941522771968          nowplaying

df1 <- unique(df2[, 1, with = FALSE])

            status_id
1: 947306525726527488
2: 947306316959281153
3: 947306315952611330
4: 947306265520328704
5: 947305941522771968
df2 <- listCol_l(mydf, "hashtags") 
library(data.table)
df2 <- setDT(juice)[, .(hashtag = unlist(hashtags)), by = status_id]
df1 <- unique(juice[, .(status_id)])

df2
             status_id             hashtag
 1: 947306525726527488 NEWYEARSEVEPARTY919
 2: 947306316959281153    MakeItALifestyle
 3: 947306315952611330              Ejuice
 4: 947306315952611330                vape
 5: 947306315952611330              vaping
 6: 947306315952611330             eliquid
 7: 947306315952611330           ecigjuice
 8: 947306315952611330                ecig
 9: 947306315952611330           vapejuice
10: 947306265520328704             vapefam
11: 947306265520328704          vapenation
12: 947306265520328704          vapefamily
13: 947306265520328704            vapelife
14: 947306265520328704            vapelyfe
15: 947306265520328704              vapeon
16: 947306265520328704          positivity
17: 947305941522771968          nowplaying
df1
            status_id
1: 947306525726527488
2: 947306316959281153
3: 947306315952611330
4: 947306265520328704
5: 947305941522771968