Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 将长类数据集转换为宽数据集,其中变量是每个类的伪代码_R_Dplyr_Reshape_Reshape2 - Fatal编程技术网

R 将长类数据集转换为宽数据集,其中变量是每个类的伪代码

R 将长类数据集转换为宽数据集,其中变量是每个类的伪代码,r,dplyr,reshape,reshape2,R,Dplyr,Reshape,Reshape2,假设我有一个数据集,其中的行是人们使用的类: attendance <- data.frame(id = c(1, 1, 1, 2, 2), class = c("Math", "English", "Math", "Reading", "Math")) I.e., id class 1 1 "Math" 2 1 "English" 3 1 "Math" 4 2 "Readi

假设我有一个数据集,其中的行是人们使用的类:

attendance <- data.frame(id = c(1, 1, 1, 2, 2),
                         class = c("Math", "English", "Math", "Reading", "Math"))  

I.e.,

     id  class  
   1 1   "Math" 
   2 1   "English"
   3 1   "Math"
   4 2   "Reading"
   5 2   "Math"
我熟悉dplyr,所以如果在解决方案中使用了dplyr,对我来说会更容易,但这不是必需的。谢谢你的帮助

使用:

library(reshape2)
attendance$val <- 'yes'
dcast(unique(attendance), id ~ class, value.var = 'val', fill = 'no')
与data.table类似的方法:

或使用dplyr/tidyr:

另一个稍微复杂的选项可能是先重新塑造形状,然后用是和否替换计数参见dcast的默认聚合选项:

现在,您可以将计数替换为:

# create index which counts are above zero
idx <- att2[,-1] > 0
# replace the non-zero values with 'yes'
att2[,-1][idx] <- 'yes'
# replace the zero values with 'no'
att2[,-1][!idx] <- 'no'
使用:

与data.table类似的方法:

或使用dplyr/tidyr:

另一个稍微复杂的选项可能是先重新塑造形状,然后用是和否替换计数参见dcast的默认聚合选项:

现在,您可以将计数替换为:

# create index which counts are above zero
idx <- att2[,-1] > 0
# replace the non-zero values with 'yes'
att2[,-1][idx] <- 'yes'
# replace the zero values with 'no'
att2[,-1][!idx] <- 'no'
我们可以用base R做这个

注意:二进制可以很容易地转换为“是”、“否”,但最好是1/0或真/假

注意:二进制可以很容易地转换为“是”、“否”,但最好是1/0或真/假

基本上只是tableuniqueattendance本质上就是tableuniqueattendance
  id English Math Reading
1  1     yes  yes      no
2  2      no  yes     yes
library(data.table)
dcast(unique(setDT(attendance))[,val:='yes'], id ~ class, value.var = 'val', fill = 'no')
library(dplyr)
library(tidyr)
attendance %>% 
  distinct() %>% 
  mutate(var = 'yes') %>% 
  spread(class, var, fill = 'no')
 att2 <- dcast(attendance, id ~ class, value.var = 'class')
  id English Math Reading
1  1       1    2       0
2  2       0    1       1
# create index which counts are above zero
idx <- att2[,-1] > 0
# replace the non-zero values with 'yes'
att2[,-1][idx] <- 'yes'
# replace the zero values with 'no'
att2[,-1][!idx] <- 'no'
> att2
  id English Math Reading
1  1     yes  yes      no
2  2      no  yes     yes
attendance$val <- "yes"
d1 <- reshape(attendance, idvar = 'id', direction = 'wide', timevar = 'class')
d1[is.na(d1)] <- "no"
names(d1) <- sub("val\\.", '', names(d1))
d1
#  id Math English Reading
#1  1  yes     yes      no
#4  2  yes      no     yes
xtabs(val ~id + class, transform(unique(attendance), val = 1))
#    class
# id  English Math Reading
#  1       1    1       0
#  2       0    1       1