Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/64.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 展开交错的数据帧_R_Dataframe_Dplyr_Tidyr - Fatal编程技术网

R 展开交错的数据帧

R 展开交错的数据帧,r,dataframe,dplyr,tidyr,R,Dataframe,Dplyr,Tidyr,我有一个data.frame,它具有交错格式,因此有两个组(a和B),组B的每一行都与紧靠它前面的组a行相关。例如: set.seed(1) df <- data.frame(group = c("A","B","A","B","A","B","B","A","B"), id = c("A.1","B.1","A.2","B.2","A.3","B.3.1","B.3.2","A.4","B.4"), score = ru

我有一个
data.frame
,它具有交错格式,因此有两个组(a和B),组B的每一行都与紧靠它前面的组a行相关。例如:

set.seed(1)
df <- data.frame(group = c("A","B","A","B","A","B","B","A","B"),
                 id = c("A.1","B.1","A.2","B.2","A.3","B.3.1","B.3.2","A.4","B.4"),
                 score = runif(9,0,1))
我想这可以用
tidyr
轻松完成


有什么想法吗?

您可以创建一个
sub_id
列,指示
a
组和
B
组是否应对齐到同一行,将数据帧分为a df和B df,然后将
sub_id
列上的两个子数据帧连接起来:

df %>% 
    mutate(sub_id = cumsum(group == 'A')) %>% 
    {full_join(
        filter(., group == 'A') %>% select(-group), 
        filter(., group == 'B') %>% select(-group), 
        by = c('sub_id' = 'sub_id'), 
        suffix = c('A', 'B')
    )} %>% select(-sub_id)

#  idA    scoreA   idB    scoreB
#1 A.1 0.2655087   B.1 0.3721239
#2 A.2 0.5728534   B.2 0.9082078
#3 A.3 0.2016819 B.3.1 0.8983897
#4 A.3 0.2016819 B.3.2 0.9446753
#5 A.4 0.6607978   B.4 0.6291140
或者使用支持多值列数据透视的
data.table::dcast

library(data.table); library(zoo)    
dcast(
    setDT(df)[, 
# create a row number column that indicates which row the current row should go to
        rn := cumsum(!(group == 'B' & lag(group) == 'A'))
    ][], 
    rn ~ group, value.var = c('id', 'score')
)[, `:=` (
    id_A = na.locf(id_A), 
    score_A = na.locf(score_A), 
    rn = NULL
)][]

#   id_A  id_B   score_A   score_B
#1:  A.1   B.1 0.2655087 0.3721239
#2:  A.2   B.2 0.5728534 0.9082078
#3:  A.3 B.3.1 0.2016819 0.8983897
#4:  A.3 B.3.2 0.2016819 0.9446753
#5:  A.4   B.4 0.6607978 0.6291140

您可以创建一个
sub_id
列,指示
a
组和
B
组是否应对齐到同一行,将数据帧分为a df和B df,然后将
sub_id
列上的两个子数据帧连接起来:

df %>% 
    mutate(sub_id = cumsum(group == 'A')) %>% 
    {full_join(
        filter(., group == 'A') %>% select(-group), 
        filter(., group == 'B') %>% select(-group), 
        by = c('sub_id' = 'sub_id'), 
        suffix = c('A', 'B')
    )} %>% select(-sub_id)

#  idA    scoreA   idB    scoreB
#1 A.1 0.2655087   B.1 0.3721239
#2 A.2 0.5728534   B.2 0.9082078
#3 A.3 0.2016819 B.3.1 0.8983897
#4 A.3 0.2016819 B.3.2 0.9446753
#5 A.4 0.6607978   B.4 0.6291140
或者使用支持多值列数据透视的
data.table::dcast

library(data.table); library(zoo)    
dcast(
    setDT(df)[, 
# create a row number column that indicates which row the current row should go to
        rn := cumsum(!(group == 'B' & lag(group) == 'A'))
    ][], 
    rn ~ group, value.var = c('id', 'score')
)[, `:=` (
    id_A = na.locf(id_A), 
    score_A = na.locf(score_A), 
    rn = NULL
)][]

#   id_A  id_B   score_A   score_B
#1:  A.1   B.1 0.2655087 0.3721239
#2:  A.2   B.2 0.5728534 0.9082078
#3:  A.3 B.3.1 0.2016819 0.8983897
#4:  A.3 B.3.2 0.2016819 0.9446753
#5:  A.4   B.4 0.6607978 0.6291140