R 转换类型为'的列;列表';到数据帧中的多个列
我有一个数据框,其中一列是列表,如下所示:R 转换类型为'的列;列表';到数据帧中的多个列,r,list,dataframe,R,List,Dataframe,我有一个数据框,其中一列是列表,如下所示: >head(movies$genre_list) [[1]] [1] "drama" "action" "romance" [[2]] [1] "crime" "drama" [[3]] [1] "crime" "drama" "mystery" [[4]] [1] "thriller" "indie" [[5]] [1] "thriller" [[6]] [1] "drama" "family" 我想将这一列转换为多列,每
>head(movies$genre_list)
[[1]]
[1] "drama" "action" "romance"
[[2]]
[1] "crime" "drama"
[[3]]
[1] "crime" "drama" "mystery"
[[4]]
[1] "thriller" "indie"
[[5]]
[1] "thriller"
[[6]]
[1] "drama" "family"
我想将这一列转换为多列,每个列对应于列表中的每个唯一元素(在本例中为“类型”),并将它们作为二进制列。我正在寻找一个优雅的解决方案,它不需要首先找出有多少种类型,然后为每个类型创建一列,然后检查每个列表元素,然后填充类型列。我尝试了取消列表,但它不能以我想要的方式处理列表向量
谢谢 以下是几种方法:
movies <- data.frame(genre_list = I(list(
c("drama", "action", "romance"),
c("crime", "drama"),
c("crime", "drama", "mystery"),
c("thriller", "indie"),
c("thriller"),
c("drama", "family"))))
或
更新:两种更直接的方法
改进的选项1:直接使用表格
:
table(rep(1:nrow(movies), sapply(movies$genre_list, length)),
unlist(movies$genre_list, use.names=FALSE))
改进的选项2:对循环使用
x <- unique(unlist(movies$genre_list, use.names=FALSE))
m <- matrix(0, ncol = length(x), nrow = nrow(movies), dimnames = list(NULL, x))
for (i in 1:nrow(m)) {
m[i, movies$genre_list[[i]]] <- 1
}
m
使用Reduce
来merge
结果列表。如果我正确理解您的最终目标,这将导致您感兴趣的结果的转换形式
merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables)
merged_tables
# Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6
# 1 action 1 NA NA NA NA NA
# 2 drama 1 1 1 NA NA 1
# 3 romance 1 NA NA NA NA NA
# 4 crime NA 1 1 NA NA NA
# 5 mystery NA NA 1 NA NA NA
# 6 indie NA NA NA 1 NA NA
# 7 thriller NA NA NA 1 1 NA
# 8 family NA NA NA NA NA 1
使用与其他回复中相同的输入,这里有一些备选方案:
1) 系数/表格/rbind
> levs <- levels(factor(unlist(movies[[1]])))
> as.data.frame(do.call(rbind, lapply(lapply(movies[[1]], factor, levs), table)))
action crime drama family indie mystery romance thriller
1 1 0 1 0 0 0 1 0
2 0 1 1 0 0 0 0 0
3 0 1 1 0 0 1 0 0
4 0 0 0 0 1 0 0 1
5 0 0 0 0 0 0 0 1
6 0 0 1 1 0 0 0 0
更新:增加了备选方案2
library(reshape2)
dcast(m, which ~ data, fun.aggregate = length, value.var = "which")
更新2:添加了备选方案2a。每个列表项是否总是具有唯一的类型?换句话说,一张唱片可以是“戏剧、动作、浪漫、动作”吗?谢谢!我最喜欢解决方案1!我只是不太习惯于lattice
来摸索解决方案2。@New将#2分成两行,这样就可以检查熔化的数据帧m
。这可能会提高可理解性。
merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables)
merged_tables
# Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6
# 1 action 1 NA NA NA NA NA
# 2 drama 1 1 1 NA NA 1
# 3 romance 1 NA NA NA NA NA
# 4 crime NA 1 1 NA NA NA
# 5 mystery NA NA 1 NA NA NA
# 6 indie NA NA NA 1 NA NA
# 7 thriller NA NA NA 1 1 NA
# 8 family NA NA NA NA NA 1
movie_genres <- setNames(data.frame(t(merged_tables[-1])), merged_tables[[1]])
movie_genres[is.na(movie_genres)] <- 0
movie_genres
> levs <- levels(factor(unlist(movies[[1]])))
> as.data.frame(do.call(rbind, lapply(lapply(movies[[1]], factor, levs), table)))
action crime drama family indie mystery romance thriller
1 1 0 1 0 0 0 1 0
2 0 1 1 0 0 0 0 0
3 0 1 1 0 0 1 0 0
4 0 0 0 0 1 0 0 1
5 0 0 0 0 0 0 0 1
6 0 0 1 1 0 0 0 0
> library(lattice)
> m <- do.call(make.groups, movies[[1]])
> as.data.frame.matrix(xtabs(~ which + data, m))
action crime drama family indie mystery romance thriller
c("drama", "action", "romance") 1 0 1 0 0 0 1 0
c("crime", "drama") 0 1 1 0 0 0 0 0
c("crime", "drama", "mystery") 0 1 1 0 0 1 0 0
c("thriller", "indie") 0 0 0 0 1 0 0 1
thriller 0 0 0 0 0 0 0 1
c("drama", "family") 0 0 1 1 0 0 0 0
library(reshape2)
dcast(m, which ~ data, fun.aggregate = length, value.var = "which")