R 将多个分类变量重塑为二进制响应变量
我正在尝试转换以下格式:R 将多个分类变量重塑为二进制响应变量,r,plyr,reshape,reshape2,R,Plyr,Reshape,Reshape2,我正在尝试转换以下格式: mydata <- data.frame(movie = c("Titanic", "Departed"), actor1 = c("Leo", "Jack"), actor2 = c("Kate", "Leo")) movie actor1 actor2 1 Titanic Leo Kate 2 Departed Jack Leo 我尝试了
mydata <- data.frame(movie = c("Titanic", "Departed"),
actor1 = c("Leo", "Jack"),
actor2 = c("Kate", "Leo"))
movie actor1 actor2
1 Titanic Leo Kate
2 Departed Jack Leo
我尝试了中描述的解决方案
但我可以让它对两个变量起作用,而不是三个
如果有一种干净的方法可以做到这一点,我将不胜感激。重塑
数据的一种方法。frame
是使用restrape2
包,使用melt
和dcast
。例如:
library(reshape2)
long.mydata <- melt(mydata, id.vars = "movie")
wide.mydata <- dcast(long.mydata, movie ~ value, function(x) 1, fill = 0)
library(重塑2)
long.mydata因为他们说多样性是生活的调味品,这里有一种使用表的base R方法:
table(cbind(mydata[1],
actor = unlist(mydata[-1], use.names=FALSE)))
# actor
# movie Jack Leo Kate
# Departed 1 1 0
# Titanic 0 1 1
上述输出是类表的矩阵。要获取data.frame
,请使用as.data.frame.matrix
as.data.frame.matrix(table(
cbind(mydata[1], actor = unlist(mydata[-1], use.names=FALSE))))
# Jack Leo Kate
# Departed 1 1 0
# Titanic 0 1 1
多少香料太多?以下是通过tidyr
提供的解决方案:
library(dplyr)
library(tidyr)
mydata %>%
gather(actor,name,starts_with("actor")) %>%
mutate(present = 1) %>%
select(-actor) %>%
spread(name,present,fill = 0)
movie Jack Kate Leo
1 Departed 1 0 1
2 Titanic 0 1 1
reformae2
-包还具有recast
-功能
守则:
library(reshape2)
recast(mydata, id.var = 'movie', movie ~ value, fun.aggregate = length)
结果是:
movie Jack Kate Leo
1 Departed 1 0 1
2 Titanic 0 1 1
更新的基于tidyr
的选项是转换为长形,使用complete
填充缺少的电影和演员组合,然后将逻辑is.na
测试转换为数值。然后将形状改回宽
library(tidyr)
mydata%>%
pivot_longer(以“演员”开头),将名称改为“=”acted“%>%
完成(电影,价值)%>%
dplyr::mutate(acted=as.numeric(!is.na(acted)))%>%
枢轴(名称从=值,值从=动作)
#>#A tibble:2 x 4
#>电影杰克·利奥·凯特
#>
#>1,11,10
#>2泰坦尼克号01
movie Jack Kate Leo
1 Departed 1 0 1
2 Titanic 0 1 1