在所有列中使用相同因子集的R模型矩阵
我有一组篮球阵容数据,有五列,每列共享相同的因子,如下所示:在所有列中使用相同因子集的R模型矩阵,r,dummy-data,model.matrix,R,Dummy Data,Model.matrix,我有一组篮球阵容数据,有五列,每列共享相同的因子,如下所示: head(dat) V1 V2 V3 V4 V5 1 MILES,KEATON KINGSLEY,MOSES BELL,ANTHLON HANNAHS,DUSTY DURHAM,JABRIL 2 MILES,KEATON KINGSLEY,MOSES BELL,ANTHLON HANNAHS,DUS
head(dat)
V1 V2 V3 V4 V5
1 MILES,KEATON KINGSLEY,MOSES BELL,ANTHLON HANNAHS,DUSTY DURHAM,JABRIL
2 MILES,KEATON KINGSLEY,MOSES BELL,ANTHLON HANNAHS,DUSTY DURHAM,JABRIL
3 KINGSLEY,MOSES BELL,ANTHLON HANNAHS,DUSTY DURHAM,JABRIL THOMPSON,TREY
4 KINGSLEY,MOSES BELL,ANTHLON HANNAHS,DUSTY THOMPSON,TREY BEARD,ANTON
5 THOMPSON,TREY BEARD,ANTON KOUASSI,WILLY WHITT,JIMMY WATKINS,MANUALE
6 THOMPSON,TREY BEARD,ANTON KOUASSI,WILLY WHITT,JIMMY WATKINS,MANUALE
我想做的是让每一行都是该行上显示的当前因子的虚拟编码,如下所示:
MILES,KEATON KINGSLEY,MOSES BELL,ANTHLON HANNAHS,DUSTY DURHAM,JABRIL THOMPSON,TREY BEARD,ANTON KOUASSI,WILLY WHITT,JIMMY WATKINS,MANUALE
1 1 1 1 1 0 0 0 0 0
1 1 1 1 1 0 0 0 0 0
0 1 1 1 1 1 0 0 0 0
然而,model.matrix似乎只有一列的范围;它不允许我在多个列中共享整个因子集。根据[this thread][1]中的一些建议,我尝试:
df <- as.data.frame(lapply(dat,as.factor))
fList <- lapply(names(df),reformulate,intercept=FALSE)
mList <- lapply(fList,sparse.model.matrix,data=df)
br <- do.call(cBind,mList)
head(br)
6 x 31 sparse Matrix of class "dgCMatrix"
[[ suppressing 31 column names ‘V1BEARD,ANTON’, ‘V1BELL,ANTHLON’, ‘V1KINGSLEY,MOSES’ ... ]]
1 . . . 1 . . . . 1 . . 1 . . . . . . 1 . . . . . . 1 . . . . .
2 . . . 1 . . . . 1 . . 1 . . . . . . 1 . . . . . . 1 . . . . .
3 . . 1 . . . 1 . . . . . . 1 . . . 1 . . . . . . . . . . . 1 .
4 . . 1 . . . 1 . . . . . . 1 . . . . . . . 1 . . 1 . . . . . .
5 . . . . 1 1 . . . . . . . . 1 . . . . . . . . 1 . . . . . . 1
6 . . . . 1 1 . . . . . . . . 1 . . . . . . . . 1 . . . . . . 1
df我们可以从qdapTools
library(qdapTools)
mtabulate(as.data.frame(t(df1)))
# BELL,ANTHLON DURHAM,JABRIL HANNAHS,DUSTY KINGSLEY,MOSES MILES,KEATON THOMPSON,TREY BEARD,ANTON KOUASSI,WILLY
#1 1 1 1 1 1 0 0 0
#2 1 1 1 1 1 0 0 0
#3 1 1 1 1 0 1 0 0
#4 1 0 1 1 0 1 1 0
#5 0 0 0 0 0 1 1 1
#6 0 0 0 0 0 1 1 1
# WATKINS,MANUALE WHITT,JIMMY
#1 0 0
#2 0 0
#3 0 0
#4 0 0
#5 1 1
#6 1 1
或使用base R
table(rep(1:nrow(df1), ncol(df1)), unlist(df1))