重塑数据：提取“；“一些”；行，并将它们转换为R中的新列_R_Reshape

重塑数据：提取“；“一些”；行，并将它们转换为R中的新列

重塑数据：提取“；“一些”；行，并将它们转换为R中的新列,r,reshape,R,Reshape,（已编辑）我有一个很长的数据集，其中有多个长格式的列。下面是一个数据示例： Groups duration response value trial ------ ----- -------- --------- -------- C 525 ID 5578 ID C 525 1-1 676|342 C3 C 525 1-2 676|342 C3 C

（已编辑）我有一个很长的数据集，其中有多个长格式的列。下面是一个数据示例：

Groups duration response  value      trial
------ -----    -------- --------- --------
C       525      ID       5578        ID
C       525      1-1      676|342     C3
C       525      1-2      676|342     C3
C       525      1-3      676|342     C3
C       525      1-4      676|342     C3
C       525      1-5      676|342     C3
C       521      ID       6331        ID
C       521      1-1      643|461     C3

在此数据框中，每个参与者的ID与响应和值位于同一列中。我需要的是将对应于“ID”的行放入一个单独的repeatedmeasures列中，使其看起来像这样：

Groups duration  ID     response   value      trial
------ -----   ------   --------  --------- --------
C       525     5578       1-1      676|342    C3 
C       525     5578       1-2      676|342    C3
C       525     5578       1-3      676|342    C3
C       525     5578       1-4      676|342    C3
C       525     5578       1-5      676|342    C3
C       525     5578       1-6      676|342    C3
C       521     6331       1-1      643|461    C3
C       521     6331       1-2      643|461    C3
C       521     6331       1-3      643|461    C3
C       521     6331       1-4      643|461    C3
C       521     6331       1-5      643|461    C3
C       521     6331       1-6      643|461    C3

我最初的尝试是将数据帧转换为宽格式，以便ID和其他响应都有自己的列，然后再次将其变长，但仅限于示例中的列1-1到1-6，代码如下：

df <- spread(df, response, value)

#fill in the whole column with corresponding values
df<-fill(df, ID, .direction="down")

df <- gather(df, name, coordinates, 9:1417, factor_key=TRUE)

df一个选项是通过执行逻辑向量的累积和（response=='ID'
），基于'ID'的出现创建一个分组，然后创建'ID'列作为'value'中的第一个
元素，然后使用slice
删除第一行并删除'grp'列
library(dplyr)
df %>%
   group_by(grp = cumsum(response == 'ID'), Groups) %>%
   mutate(ID = first(value)) %>%
   slice(-1) %>%
   ungroup %>%
   select(-grp)

-输出
# A tibble: 6 x 6
#  Groups duration response value   trial ID   
#  <chr>     <int> <chr>    <chr>   <chr> <chr>
#1 C           525 1-1      676|342 C3    5578 
#2 C           525 1-2      676|342 C3    5578 
#3 C           525 1-3      676|342 C3    5578 
#4 C           525 1-4      676|342 C3    5578 
#5 C           525 1-5      676|342 C3    5578 
#6 C           521 1-1      643|461 C3    6331 

# A tibble: 12 x 6
#   Groups duration value   trial ID    response
#   <chr>     <int> <chr>   <chr> <chr> <chr>   
# 1 C           525 676|342 C3    5578  1-1     
# 2 C           525 676|342 C3    5578  1-2     
# 3 C           525 676|342 C3    5578  1-3     
# 4 C           525 676|342 C3    5578  1-4     
# 5 C           525 676|342 C3    5578  1-5     
# 6 C           525 676|342 C3    5578  1-6     
# 7 C           521 643|461 C3    6331  1-1     
# 8 C           521 643|461 C3    6331  1-2     
# 9 C           521 643|461 C3    6331  1-3     
#10 C           521 643|461 C3    6331  1-4     
#11 C           521 643|461 C3    6331  1-5     
#12 C           521 643|461 C3    6331  1-6     

-输出
# A tibble: 6 x 6
#  Groups duration response value   trial ID   
#  <chr>     <int> <chr>    <chr>   <chr> <chr>
#1 C           525 1-1      676|342 C3    5578 
#2 C           525 1-2      676|342 C3    5578 
#3 C           525 1-3      676|342 C3    5578 
#4 C           525 1-4      676|342 C3    5578 
#5 C           525 1-5      676|342 C3    5578 
#6 C           521 1-1      643|461 C3    6331 

# A tibble: 12 x 6
#   Groups duration value   trial ID    response
#   <chr>     <int> <chr>   <chr> <chr> <chr>   
# 1 C           525 676|342 C3    5578  1-1     
# 2 C           525 676|342 C3    5578  1-2     
# 3 C           525 676|342 C3    5578  1-3     
# 4 C           525 676|342 C3    5578  1-4     
# 5 C           525 676|342 C3    5578  1-5     
# 6 C           525 676|342 C3    5578  1-6     
# 7 C           521 643|461 C3    6331  1-1     
# 8 C           521 643|461 C3    6331  1-2     
# 9 C           521 643|461 C3    6331  1-3     
#10 C           521 643|461 C3    6331  1-4     
#11 C           521 643|461 C3    6331  1-5     
#12 C           521 643|461 C3    6331  1-6     

#一个tible:12 x 6
#组持续时间值试验ID响应
#                 
#1 C 525 676 | 342 C3 5578 1-1
#2 C 525 676 | 342 C3 5578 1-2
#3 C 525 676 | 342 C3 5578 1-3
#4 C 525 676 | 342 C3 5578 1-4
#5 C 525 676 | 342 C3 5578 1-5
#6 C 525 676 | 342 C3 5578 1-6
#7 C 521 643 | 461 C3 6331 1-1
#8 C 521 643 | 461 C3 6331 1-2
#9 C 521 643 | 461 C3 6331 1-3
#10 C 521 643 | 461 C3 6331 1-4
#11 C 521 643 | 461 C3 6331 1-5
#12 C 521 643 | 461 C3 6331 1-6

优点是我们不需要做任何重塑，而是在相同的数据上创建列，并在末尾删除一些行
数据
df使用值创建一个新列（ID
），并将替换为NA
，用于响应！='ID'
，填充NA
值，并删除带有response='ID'
的行
library(dplyr)
library(tidyr)

df %>%
  mutate(ID  = replace(value, response != 'ID', NA)) %>%
  fill(ID) %>%
  filter(response != 'ID')

#  Groups duration response   value trial   ID
#1      C      525      1-1 676|342    C3 5578
#2      C      525      1-2 676|342    C3 5578
#3      C      525      1-3 676|342    C3 5578
#4      C      525      1-4 676|342    C3 5578
#5      C      525      1-5 676|342    C3 5578
#6      C      521      1-1 643|461    C3 6331

一种基本的R方法可以是在cumsum上拆分，然后重新组合（并重新排序列以获得预期的输出）：
df我喜欢这种方法！