R 将两个样本ID的对应值合并到一个新的单列中_R_Dataframe_Join_Dplyr_Bioinformatics

R 将两个样本ID的对应值合并到一个新的单列中

r dataframe join

R 将两个样本ID的对应值合并到一个新的单列中,r,dataframe,join,dplyr,bioinformatics,R,Dataframe,Join,Dplyr,Bioinformatics,我有一个dataframesampleManifest如下所示： SampleName Status Role Sex AU056001_00HI1299A unaffected sibling female AU056002_00HI1301A unaffected proband male AU0780201_00HI1775A unaffected father

我有一个dataframe

sampleManifest

如下所示：

SampleName          Status          Role          Sex
AU056001_00HI1299A  unaffected      sibling       female
AU056002_00HI1301A  unaffected      proband       male  
AU0780201_00HI1775A unaffected      father        male  
AU0780202_00HI1777A unaffected      mother        female
AU0780301_00HI1778A affected        proband       male  
.
.
.

FID    ID1                      ID2             Roles           Kinship Relationship    
AU0560 AU056001_00HI1299A  AU056002_00HI1301A   sibling-proband 0.0283  full-sibling   
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A  father-mother  -0.00160 unrelated   
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A  father-proband  0.284   parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A  mother-proband  0.246   parent-child
.
.
.

以及一个单独的成对样本比较数据框，

亲属关系估计值

：

FID    ID1                      ID2             Kinship Relationship    
AU0560 AU056001_00HI1299A  AU056002_00HI1301A   0.0283  full-sibling   
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A -0.00160 unrelated   
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A  0.284   parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A  0.246   parent-child
.
.
.

我想构建一个新的数据框架，其中

sampleManifest$Role

用于

kinshipEstimates

的每一行中的两个样本，因此它看起来如下所示：

SampleName          Status          Role          Sex
AU056001_00HI1299A  unaffected      sibling       female
AU056002_00HI1301A  unaffected      proband       male  
AU0780201_00HI1775A unaffected      father        male  
AU0780202_00HI1777A unaffected      mother        female
AU0780301_00HI1778A affected        proband       male  
.
.
.

FID    ID1                      ID2             Roles           Kinship Relationship    
AU0560 AU056001_00HI1299A  AU056002_00HI1301A   sibling-proband 0.0283  full-sibling   
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A  father-mother  -0.00160 unrelated   
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A  father-proband  0.284   parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A  mother-proband  0.246   parent-child
.
.
.

我一直在尝试使用

left\u join

，但不知道如何将成对的每个样本对应的

角色

合并为单个值。

解决方案是使用

tidyverse

包使用双

left\u join

。首先在

ID1

和

SampleName

上使用

sampleManifest

加入

kinshipEstimates

。再次将

sampleManifest

与

ID2

和

SampleName

上的结果连接起来。最后，使用

tidyr:：unite

合并

Role.x

和

Role.y

library(tidyverse)

left_join(kinshipEstimates, sampleManifest, by=c("ID1" = "SampleName")) %>%
  select(-Status, -Sex) %>%
  left_join(sampleManifest, by=c("ID2" = "SampleName")) %>%
  unite(Roles, Role.x, Role.y, sep="-") %>%
  select(-Sex, -Status)


#      FID                 ID1                 ID2 Kinship Relationship           Roles
# 1 AU0560  AU056001_00HI1299A  AU056002_00HI1301A  0.0283 full-sibling sibling-proband
# 2 AU0780 AU0780201_00HI1775A AU0780202_00HI1777A -0.0016    unrelated   father-mother
# 3 AU0780 AU0780201_00HI1775A AU0780301_00HI1778A  0.2840 parent-child  father-proband
# 4 AU0780 AU0780202_00HI1777A AU0780301_00HI1778A  0.2460 parent-child  mother-proband

数据：

sampleManifest <- read.table(text = 
"SampleName          Status          Role          Sex
AU056001_00HI1299A  unaffected      sibling       female
AU056002_00HI1301A  unaffected      proband       male  
AU0780201_00HI1775A unaffected      father        male  
AU0780202_00HI1777A unaffected      mother        female
AU0780301_00HI1778A affected        proband       male",
stringsAsFactors = FALSE, header = TRUE)

kinshipEstimates <- read.table(text = 
"FID    ID1                      ID2             Kinship Relationship    
AU0560 AU056001_00HI1299A  AU056002_00HI1301A   0.0283  full-sibling   
AU0780 AU0780201_00HI1775A AU0780202_00HI1777A -0.00160 unrelated   
AU0780 AU0780201_00HI1775A AU0780301_00HI1778A  0.284   parent-child
AU0780 AU0780202_00HI1777A AU0780301_00HI1778A  0.246   parent-child",
stringsAsFactors = FALSE, header = TRUE)

sampleManifest这里有一种方法，它使用聚集
，一个内部联接
，以及分组方式


添加行号允许我们在分组时跟踪ID1
/ID2
对：
kinshipEstimates %>%
  mutate(row_num = row_number()) %>%
  gather(which_id, id, -row_num, -FID, -Kinship, -Relationship) %>%
  inner_join(sampleManifest, by=c("id" = "SampleName")) %>%
  group_by(FID, row_num) %>%
  summarise(Roles = paste(Role, collapse="-"),
            Kinship = first(Kinship),
            Relationship = first(Relationship))

  FID    row_num Roles            Kinship Relationship
  <chr>    <int> <chr>              <dbl> <chr>       
1 AU0560       1 sibling-proband  0.0283  full-sibling
2 AU0780       2 father-mother   -0.00160 unrelated   
3 AU0780       3 father-proband   0.284   parent-child
4 AU0780       4 mother-proband   0.246   parent-child

亲属关系估计%>%
变异（行数=行数（））%>%
聚集（其中\u id，id，-row\u num，-FID，-亲属关系，-关系）%>%
内部联接（sampleManifest，by=c（“id”=“SampleName”））%>%
分组依据（FID，行数）%>%
总结（角色=粘贴（角色，折叠=“-”，
亲属关系=第一（亲属关系），
关系=第一（关系））
FID行数角色亲属关系
1 AU0560 1同胞先证者0.0283完全同胞
2 AU0780 2父亲-母亲-0.00160
3 AU0780 3父亲先证者0.284亲子
4 AU0780 4母亲先证者0.246亲子
请使用dput
功能发布数据。