R dplyr在删除重复元素的同时将轴旋转为长形式
我有一个数据框,看起来是这样的:R dplyr在删除重复元素的同时将轴旋转为长形式,r,dplyr,duplicates,pivot,R,Dplyr,Duplicates,Pivot,我有一个数据框,看起来是这样的: | id | nct_id |minimum_age| maximum_age |criteria_rank | criteria 1 |6516355| NCT04293180| 2 Years | 50 Years| Inclusion Criteria_1| criteria 1 description 2 |6516355| NCT04293180| 2 Yea
| id | nct_id |minimum_age| maximum_age |criteria_rank | criteria
1 |6516355| NCT04293180| 2 Years | 50 Years| Inclusion Criteria_1| criteria 1 description
2 |6516355| NCT04293180| 2 Years | 50 Years| Inclusion Criteria_2| criteria 2 description
3 |6516355| NCT04293180| 2 Years | 50 Years| Exclusion Criteria_1| criteria 3 description
4 |6531830| NCT04091700| 18 Years| 45 Years | Inclusion Criteria_1| criteria 1 description
5 |6531830| NCT04091700| 18 Years| 45 Years | Inclusion Criteria_2| criteria 2 description
6 |6531830| NCT04091700| 18 Years| 45 Years | Exclusion Criteria_1| criteria 3 description
7 |6531830| NCT04091700| 18 Years| 45 Years | Exclusion Criteria_2| criteria 4 description
|V1 | V2 | V3 |
1 |id | 6516355 | 6531830 |
2 |nct_id | NCT04293185| NCT04091737 |
3 |minimum_age | 2 Years | 18 Years |
4 |maximum_age | 50 Years | 45 Years |
5 |Inclusion Criteria_1| criteria 1 description. | criteria 1 description. |
6 |Inclusion Criteria_2| criteria 2 description. | criteria 2 description. |
7 |Exclusion Criteria_1| criteria 3 description. | criteria 3 description. |
8 |Exclusion Criteria_2| NA | criteria 4 description. |
我想将其转换为如下所示的数据帧:
| id | nct_id |minimum_age| maximum_age |criteria_rank | criteria
1 |6516355| NCT04293180| 2 Years | 50 Years| Inclusion Criteria_1| criteria 1 description
2 |6516355| NCT04293180| 2 Years | 50 Years| Inclusion Criteria_2| criteria 2 description
3 |6516355| NCT04293180| 2 Years | 50 Years| Exclusion Criteria_1| criteria 3 description
4 |6531830| NCT04091700| 18 Years| 45 Years | Inclusion Criteria_1| criteria 1 description
5 |6531830| NCT04091700| 18 Years| 45 Years | Inclusion Criteria_2| criteria 2 description
6 |6531830| NCT04091700| 18 Years| 45 Years | Exclusion Criteria_1| criteria 3 description
7 |6531830| NCT04091700| 18 Years| 45 Years | Exclusion Criteria_2| criteria 4 description
|V1 | V2 | V3 |
1 |id | 6516355 | 6531830 |
2 |nct_id | NCT04293185| NCT04091737 |
3 |minimum_age | 2 Years | 18 Years |
4 |maximum_age | 50 Years | 45 Years |
5 |Inclusion Criteria_1| criteria 1 description. | criteria 1 description. |
6 |Inclusion Criteria_2| criteria 2 description. | criteria 2 description. |
7 |Exclusion Criteria_1| criteria 3 description. | criteria 3 description. |
8 |Exclusion Criteria_2| NA | criteria 4 description. |
基本上,我希望将数据帧转换为一个长的形式,并在保留包含和排除标准的同时去掉常见的重复元素。这只是数据的一个样本,但我有几个nct_id,每个nct_id的包含和排除标准数量不同,因此包含排除标准最多的nct_id将决定创建多少包含排除标准行。包含/排除标准较少的nct_id将填写NA(如示例最后一行所示)。在本例中,第一列具有行名称
弄不清楚。提前感谢您的指点
样本数据如下:
structure(list(id = c(6516355, 6516355, 6516355, 6531830, 6531830,
6531830, 6531830), nct_id = c("NCT04293180", "NCT04293180", "NCT04293180",
"NCT04091700", "NCT04091700", "NCT04091700", "NCT04091700"),
minimum_age = c("2 Years", "2 Years", "2 Years", "18 Years",
"18 Years", "18 Years", "18 Years"), maximum_age = c("50 Years",
"50 Years", "50 Years", "45 Years", "45 Years", "45 Years",
"45 Years"), criteria_rank = c("Inclusion Criteria_1", "Inclusion Criteria_2",
"Exclusion Criteria_1", "Inclusion Criteria_1", "Inclusion Criteria_2",
"Exclusion Criteria_1", "Exclusion Criteria_2"), criteria = c("criteria 1 description",
"criteria 2 description", "criteria 3 description", "criteria 1 description",
"criteria 2 description", "criteria 3 description", "criteria 4 description"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), spec = structure(list(cols = list(
id = structure(list(), class = c("collector_double", "collector"
)), nct_id = structure(list(), class = c("collector_character",
"collector")), minimum_age = structure(list(), class = c("collector_character",
"collector")), maximum_age = structure(list(), class = c("collector_character",
"collector")), criteria_rank = structure(list(), class = c("collector_character",
"collector")), criteria = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
不完全是列名称
V1
,V2
,V3
-这里我使用id
作为每个记录的列名,而属性名称存储在变量
列中
library(dplyr)
library(tidyr)
data %>%
# to remove criteria_rank and store the criteria description under
# values of of criteria_rank instead.
pivot_wider(id_cols = c(id, nct_id, minimum_age, maximum_age),
names_from = "criteria_rank", values_from = "criteria") %>%
# Put data in long format where all the criteria is store in variables column
# and thier values store in values column
pivot_longer(names_to = "variables", values_to = "values",
cols = c("nct_id", "maximum_age", "minimum_age") |
contains("Criteria")) %>%
# convert data into wide format with the id is column name
pivot_wider(id_cols = variables, names_from = "id", values_from = "values")
输出
# A tibble: 7 x 3
variables `6516355` `6531830`
<chr> <chr> <chr>
1 nct_id NCT04293180 NCT04091700
2 maximum_age 50 Years 45 Years
3 minimum_age 2 Years 18 Years
4 Inclusion Criteria_1 criteria 1 description criteria 1 description
5 Inclusion Criteria_2 criteria 2 description criteria 2 description
6 Exclusion Criteria_1 criteria 3 description criteria 3 description
7 Exclusion Criteria_2 NA criteria 4 description
#一个tible:7 x 3
变量'6516355``6531830`
1 nct_id NCT04293180 NCT04091700
2最大年龄50岁45岁
3最低年龄2岁18岁
4纳入标准\u 1标准1描述标准1描述
5纳入标准2标准2描述标准2描述
6排除标准\u 1标准3说明标准3说明
7排除标准2 NA标准4说明