R 将列因子转换为包含二进制的列
我有如下数据,如下所示:R 将列因子转换为包含二进制的列,r,R,我有如下数据,如下所示: > View(mydata) Gender Race Agegroup Date ..... #m columns #1 Male Asian 1 2015/04/20 ..... #2 Female White 2 2015/04/15 ..... . . #n rows 我想将mydata转换为以下格式: Gender=Male Gender=Female Race=Asian
> View(mydata)
Gender Race Agegroup Date ..... #m columns
#1 Male Asian 1 2015/04/20 .....
#2 Female White 2 2015/04/15 .....
.
.
#n rows
我想将mydata转换为以下格式:
Gender=Male Gender=Female Race=Asian Race=White Agegroup = 1 Agegroup = 2 ......
1 0 0 0 1 0
0 1 0 1 0 1
. . . . . .
. . . . . .
我不熟悉R,我知道for循环可以工作,但有更干净的方法吗?您可以使用model.matrix在一次调用中展开多个变量:
(d <- data.frame(Gender=c("Male", "Male", "Female", "Male"), Race=c("White", "Asian", "White", "Black"), AgeGroup=factor(c(1, 2, 2, 1))))
# Gender Race AgeGroup
# 1 Male White 1
# 2 Male Asian 2
# 3 Female White 2
# 4 Male Black 1
model.matrix(~.+0, data=d, contrasts.arg=sapply(d, contrasts, contrasts=F))
# GenderFemale GenderMale RaceAsian RaceBlack RaceWhite AgeGroup1 AgeGroup2
# 1 0 1 0 0 1 1 0
# 2 0 1 1 0 0 0 1
# 3 1 0 0 0 1 0 1
# 4 0 1 0 1 0 1 0
# ...
model.matrix调用的contrasts.args位来自代码,以确保输出中显示所有级别的所有因素。您可以使用model.matrix在一次调用中展开多个变量:
(d <- data.frame(Gender=c("Male", "Male", "Female", "Male"), Race=c("White", "Asian", "White", "Black"), AgeGroup=factor(c(1, 2, 2, 1))))
# Gender Race AgeGroup
# 1 Male White 1
# 2 Male Asian 2
# 3 Female White 2
# 4 Male Black 1
model.matrix(~.+0, data=d, contrasts.arg=sapply(d, contrasts, contrasts=F))
# GenderFemale GenderMale RaceAsian RaceBlack RaceWhite AgeGroup1 AgeGroup2
# 1 0 1 0 0 1 1 0
# 2 0 1 1 0 0 0 1
# 3 1 0 0 0 1 0 1
# 4 0 1 0 1 0 1 0
# ...
model.matrix调用的contrasts.args位是源代码,用于确保所有因素的所有级别都显示在输出中。您可以使用包整形2:
DF <- data.frame(gender = c("m", "f", "m"),
agegroup = factor(c(1, 2, 2)))
library(reshape2)
dum <- lapply(names(DF), function(x, df) {
d <- df[, x, drop = FALSE]
d$id = seq_along(d[, 1])
res <- dcast(d , id ~ ..., fun.aggregate = length)
names(res)[-1] <- paste(names(d)[1], names(res)[-1], sep ="=")
res
}, df = DF)
Reduce(merge, dum)
# id gender=f gender=m agegroup=1 agegroup=2
#1 1 0 1 1 0
#2 2 1 0 0 1
#3 3 0 1 0 1
您可以使用包2:
DF <- data.frame(gender = c("m", "f", "m"),
agegroup = factor(c(1, 2, 2)))
library(reshape2)
dum <- lapply(names(DF), function(x, df) {
d <- df[, x, drop = FALSE]
d$id = seq_along(d[, 1])
res <- dcast(d , id ~ ..., fun.aggregate = length)
names(res)[-1] <- paste(names(d)[1], names(res)[-1], sep ="=")
res
}, df = DF)
Reduce(merge, dum)
# id gender=f gender=m agegroup=1 agegroup=2
#1 1 0 1 1 0
#2 2 1 0 0 1
#3 3 0 1 0 1
请发布几行您的数据集和基于此的预期结果。这可能就是你要找的。。。请发布几行您的数据集和基于此的预期结果。这可能就是你要找的。。。我注意到名称是分解的,但整数不是。是否有一个函数可以同时执行这两项任务?@GuanhuaLee我已经更新了代码,包含了一个接受整数值的变量。请注意,您需要将其转换为model.matrix的因子,以便将其拆分为单独的列。例如,您可以使用d$AgeGroup。我注意到名称是分解的,但整数不是。是否有一个函数可以同时执行这两项任务?@GuanhuaLee我已经更新了代码,包含了一个接受整数值的变量。请注意,您需要将其转换为model.matrix的因子,以便将其拆分为单独的列。例如,您可以使用d$AgeGroup