R:Spread data.frame/tible与共享密钥和丢失的数据
我有一个两列的表格,我想把它展开。我知道这是一个非常受欢迎和深入探讨的话题,然而,我尝试了几种方法,并没有得到我想要的。欢迎任何建议和投诉 我的表格里有三位女性的数据。总共有5个类别,通常每个类别都有值。但一些女性数据缺失,导致整行数据缺失-请注意,R:Spread data.frame/tible与共享密钥和丢失的数据,r,dataframe,reshape,tidyr,R,Dataframe,Reshape,Tidyr,我有一个两列的表格,我想把它展开。我知道这是一个非常受欢迎和深入探讨的话题,然而,我尝试了几种方法,并没有得到我想要的。欢迎任何建议和投诉 我的表格里有三位女性的数据。总共有5个类别,通常每个类别都有值。但一些女性数据缺失,导致整行数据缺失-请注意,Jane遗漏了有关体重的信息 a = data.frame(categories = c("name", "sex", "age", "weight", "high", "name", "
Jane
遗漏了有关体重的信息
a = data.frame(categories = c("name", "sex", "age", "weight", "high",
"name", "sex", "age", "high",
"name", "sex", "age", "weight", "high"),
values = c("Emma", "female", "32", "72", "175",
"Jane", "female", "28", "165",
"Emma", "female", "42", "63", "170"))
categories values
1 name Emma
2 sex female
3 age 32
4 weight 72
5 high 175
6 name Jane
7 sex female
8 age 28
9 high 165
10 name Emma
11 sex female
12 age 42
13 weight 63
14 high 170
我想从类别
-列和值
-行中获取。但有两个主要问题:
1) 钥匙是共享的-两个EMMA(因此我不能使用排列
或重塑
)
2) 某些类别可能缺失-如Jane的体重(因此我不能使用pivot
或split
)
最后,我想重塑数据以得到如下表:
name sex age weight high
Emma female 32 72 175
Jane female 28 NA 165
Emma female 42 63 170
假设每个条目始终存在'name'
,我们可以创建一个标识符列,并使用pivot\u wide
对其进行重塑
library(dplyr)
a %>%
group_by(grp = cumsum(categories == 'name')) %>%
tidyr::pivot_wider(names_from = categories, values_from = values) %>%
ungroup %>%
select(-grp)
# name sex age weight high
# <chr> <chr> <chr> <chr> <chr>
#1 Emma female 32 72 175
#2 Jane female 28 NA 165
#3 Emma female 42 63 170
假设所有条目都以name
开头,并在底部R中使用magrittr
进行清洁:
library(magrittr)
split(a, cumsum(a$categories == "name")) %>%
lapply(function(x) setNames(x[[2L]], x[[1L]])[unique(a$categories)]) %>%
do.call(rbind, .) %>%
data.frame()
name sex age weight high
1 Emma female 32 72 175
2 Jane female 28 <NA> 165
3 Emma female 42 63 170
library(magrittr)
split(a, cumsum(a$categories == "name")) %>%
lapply(function(x) setNames(x[[2L]], x[[1L]])[unique(a$categories)]) %>%
do.call(rbind, .) %>%
data.frame()
name sex age weight high
1 Emma female 32 72 175
2 Jane female 28 <NA> 165
3 Emma female 42 63 170
library(data.table)
split(a, cumsum(a$categories == "name")) %>%
lapply(transpose, make.names = "categories") %>%
rbindlist(fill = TRUE)