R:使用名称中共享模式的多个列将数据重塑为更长的格式
我用数据集挣扎了一段时间,想把它从一个全宽的格式变成一个全长的格式。我设法把它做成介于两者之间的一种形式。如下面的玩具示例所示,数据的格式基于R:使用名称中共享模式的多个列将数据重塑为更长的格式,r,dplyr,reshape,R,Dplyr,Reshape,我用数据集挣扎了一段时间,想把它从一个全宽的格式变成一个全长的格式。我设法把它做成介于两者之间的一种形式。如下面的玩具示例所示,数据的格式基于Cond列。问题是度量列名称中的“\u Pre”和“\u Post”必须是另一个因素,如Cond,命名为PrePost。这就是为什么我尝试的代码会产生错误的结果,行太多: vars_PrePost <- grep("Pre|Post", colnames(df)) df2 <- df %>% gather(variable, v
Cond
列。问题是度量列名称中的“\u Pre”和“\u Post”必须是另一个因素,如Cond
,命名为PrePost
。这就是为什么我尝试的代码会产生错误的结果,行太多:
vars_PrePost <- grep("Pre|Post", colnames(df))
df2 <-
df %>%
gather(variable, value, vars_PrePost, -c(ID)) %>%
tidyr::separate(variable, c("variable", "PrePost"), "_(?=[^_]+$)") %>%
spread(variable, value)
vars\u PrePost%
tidyr::separate(变量,c(“变量”,“前置”),“(?=[^.]+$)”)”%>%
排列(变量、值)
以下是玩具数据集:
df <- data.frame(stringsAsFactors=FALSE,
ID = c("10", "10", "11", "11", "12", "12"),
Age = c("23", "23", "31", "31", "24", "24"),
Gender = c("m", "m", "m", "m", "f", "f"),
Cond = c("Cond2", "Cond1", "Cond2", "Cond1", "Cond2", "Cond1"),
Measure1_Post = c(NA, "7", NA, "3", NA, "2"),
Measure1_Pre = c(NA, "3", NA, "2", NA, "2"),
Measure2_Post = c("1.3968694273826", "0.799543118218161",
"1.44098109351048", "0.836960160696351",
"1.99568500539374", "1.75138016371597"),
Measure2_Pre = c("1.19248628113128", "0.726244170934944",
"1.01175268267757", "1.26415857605178",
"2.35250186706497", "1.27070245573958"),
Measure3_Post = c("73", "84", "50", "40", "97", "89"),
Measure3_Pre = c("70", "63", "50", "46", "88", "71")
)
df在tidyrv1.0.0
中使用特殊动词.value
和names\u模式
library(tidyr) #v1.0.0
#select columns with _
pivot_longer(df, cols = matches('_'),
names_to = c(".value","PrePost"),
names_pattern = "(.*)_(.*)")
# A tibble: 12 x 8
ID Age Gender Cond PrePost Measure1 Measure2 Measure3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10 23 m Cond2 Post NA 1.3968694273826 73
2 10 23 m Cond2 Pre NA 1.19248628113128 70
3 10 23 m Cond1 Post 7 0.799543118218161 84
4 10 23 m Cond1 Pre 3 0.726244170934944 63
...
library(tidyr)#v1.0.0
#选择带有_
枢轴长度(df,cols=匹配项(“'),
name_to=c(“.value”,“PrePost”),
name_pattern=“(.*)_(.*)”
#一个tibble:12x8
ID年龄性别第二前置测量1测量2测量3
1 10 23米Cond2立柱NA 1.3968694273826 73
2 10 23米秒2前NA 1.19248628113128 70
3 10 23米导管柱7 0.799543118218161 84
4 10 23米直径1 3 0.726244170934944 63之前
...
library(tidyr) #v1.0.0
#select columns with _
pivot_longer(df, cols = matches('_'),
names_to = c(".value","PrePost"),
names_pattern = "(.*)_(.*)")
# A tibble: 12 x 8
ID Age Gender Cond PrePost Measure1 Measure2 Measure3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 10 23 m Cond2 Post NA 1.3968694273826 73
2 10 23 m Cond2 Pre NA 1.19248628113128 70
3 10 23 m Cond1 Post 7 0.799543118218161 84
4 10 23 m Cond1 Pre 3 0.726244170934944 63
...