R 将两个样本ID的对应值合并到一个新的单列中
我有一个dataframeR 将两个样本ID的对应值合并到一个新的单列中,r,dataframe,join,dplyr,bioinformatics,R,Dataframe,Join,Dplyr,Bioinformatics,我有一个dataframesampleManifest如下所示: SampleName Status Role Sex AU056001_00HI1299A unaffected sibling female AU056002_00HI1301A unaffected proband male AU0780201_00HI1775A unaffected father
sampleManifest
如下所示:
SampleName Status Role Sex
AU056001_00HI1299A unaffected sibling female
AU056002_00HI1301A unaffected proband male
AU0780201_00HI1775A unaffected father male
AU0780202_00HI1777A unaffected mother female
AU0780301_00HI1778A affected proband male
.
.
.
FID ID1 ID2 Roles Kinship Relationship
AU0560 AU056001_00HI1299A AU056002_00HI1301A sibling-proband 0.0283 full-sibling
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A father-mother -0.00160 unrelated
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A father-proband 0.284 parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A mother-proband 0.246 parent-child
.
.
.
以及一个单独的成对样本比较数据框,亲属关系估计值
:
FID ID1 ID2 Kinship Relationship
AU0560 AU056001_00HI1299A AU056002_00HI1301A 0.0283 full-sibling
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A -0.00160 unrelated
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A 0.284 parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A 0.246 parent-child
.
.
.
我想构建一个新的数据框架,其中sampleManifest$Role
用于kinshipEstimates
的每一行中的两个样本,因此它看起来如下所示:
SampleName Status Role Sex
AU056001_00HI1299A unaffected sibling female
AU056002_00HI1301A unaffected proband male
AU0780201_00HI1775A unaffected father male
AU0780202_00HI1777A unaffected mother female
AU0780301_00HI1778A affected proband male
.
.
.
FID ID1 ID2 Roles Kinship Relationship
AU0560 AU056001_00HI1299A AU056002_00HI1301A sibling-proband 0.0283 full-sibling
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A father-mother -0.00160 unrelated
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A father-proband 0.284 parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A mother-proband 0.246 parent-child
.
.
.
我一直在尝试使用
left\u join
,但不知道如何将成对的每个样本对应的角色
合并为单个值。解决方案是使用tidyverse
包使用双left\u join
。首先在ID1
和SampleName
上使用sampleManifest
加入kinshipEstimates
。再次将sampleManifest
与ID2
和SampleName
上的结果连接起来。最后,使用tidyr::unite
合并Role.x
和Role.y
library(tidyverse)
left_join(kinshipEstimates, sampleManifest, by=c("ID1" = "SampleName")) %>%
select(-Status, -Sex) %>%
left_join(sampleManifest, by=c("ID2" = "SampleName")) %>%
unite(Roles, Role.x, Role.y, sep="-") %>%
select(-Sex, -Status)
# FID ID1 ID2 Kinship Relationship Roles
# 1 AU0560 AU056001_00HI1299A AU056002_00HI1301A 0.0283 full-sibling sibling-proband
# 2 AU0780 AU0780201_00HI1775A AU0780202_00HI1777A -0.0016 unrelated father-mother
# 3 AU0780 AU0780201_00HI1775A AU0780301_00HI1778A 0.2840 parent-child father-proband
# 4 AU0780 AU0780202_00HI1777A AU0780301_00HI1778A 0.2460 parent-child mother-proband
数据:
sampleManifest <- read.table(text =
"SampleName Status Role Sex
AU056001_00HI1299A unaffected sibling female
AU056002_00HI1301A unaffected proband male
AU0780201_00HI1775A unaffected father male
AU0780202_00HI1777A unaffected mother female
AU0780301_00HI1778A affected proband male",
stringsAsFactors = FALSE, header = TRUE)
kinshipEstimates <- read.table(text =
"FID ID1 ID2 Kinship Relationship
AU0560 AU056001_00HI1299A AU056002_00HI1301A 0.0283 full-sibling
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A -0.00160 unrelated
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A 0.284 parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A 0.246 parent-child",
stringsAsFactors = FALSE, header = TRUE)
sampleManifest这里有一种方法,它使用聚集
,一个内部联接
,以及分组方式
添加行号允许我们在分组时跟踪ID1
/ID2
对:
kinshipEstimates %>%
mutate(row_num = row_number()) %>%
gather(which_id, id, -row_num, -FID, -Kinship, -Relationship) %>%
inner_join(sampleManifest, by=c("id" = "SampleName")) %>%
group_by(FID, row_num) %>%
summarise(Roles = paste(Role, collapse="-"),
Kinship = first(Kinship),
Relationship = first(Relationship))
FID row_num Roles Kinship Relationship
<chr> <int> <chr> <dbl> <chr>
1 AU0560 1 sibling-proband 0.0283 full-sibling
2 AU0780 2 father-mother -0.00160 unrelated
3 AU0780 3 father-proband 0.284 parent-child
4 AU0780 4 mother-proband 0.246 parent-child
亲属关系估计%>%
变异(行数=行数())%>%
聚集(其中\u id,id,-row\u num,-FID,-亲属关系,-关系)%>%
内部联接(sampleManifest,by=c(“id”=“SampleName”))%>%
分组依据(FID,行数)%>%
总结(角色=粘贴(角色,折叠=“-”,
亲属关系=第一(亲属关系),
关系=第一(关系))
FID行数角色亲属关系
1 AU0560 1同胞先证者0.0283完全同胞
2 AU0780 2父亲-母亲-0.00160
3 AU0780 3父亲先证者0.284亲子
4 AU0780 4母亲先证者0.246亲子
请使用dput
功能发布数据。