R 当A列值匹配时,用B列值替换B列NAs

R 当A列值匹配时,用B列值替换B列NAs,r,merge,pattern-matching,na,R,Merge,Pattern Matching,Na,我试图基于两个变量(Entrez.ID和Gene.ID)合并两个数据帧。一个数据帧只有那些变量,例如 Entrez.ID Gene.ID 10007 GNPDA1 10016 ALG2 10044 SH2D3C 以及一个具有三个变量的数据帧,例如 Entrez.ID Gene.ID Ensembl.ID 10007 GPI ENSG00000113552 10016 PDCD6 ENSG00000249

我试图基于两个变量(Entrez.ID和Gene.ID)合并两个数据帧。一个数据帧只有那些变量,例如

Entrez.ID  Gene.ID
10007      GNPDA1
10016      ALG2
10044      SH2D3C 
以及一个具有三个变量的数据帧,例如

Entrez.ID    Gene.ID   Ensembl.ID
10007        GPI       ENSG00000113552
10016        PDCD6     ENSG00000249915
10044        CHAT      ENSG00000095370
当前,当我使用以下命令合并文件时:

df<-merge(df1,df2,by=c("Entrez.ID","Gene.ID"),all=TRUE)

我如何告诉R在Entrez.ID变量匹配的地方,我希望Ensembl.ID匹配(即在可用的情况下用Ensembl.ID替换NA)?

我们可以使用
NA.locf
from
zoo

library(zoo)
df$Ensembl.ID <- with(df, ave(Ensembl.ID, Entrez.ID, FUN = function(x)
        na.locf(na.locf(x, na.rm = FALSE), fromLast = TRUE)))
df$Ensembl.ID
#[1] "ENSG00000113552" "ENSG00000113552" "ENSG00000249915" 
#[4] "ENSG00000249915" "ENSG00000095370"
#[6] "ENSG00000095370"
数据
df如果df1的Gene.ID和df2的Gene.ID是唯一的(df1中的ID不是df2中的ID),您可以简单地合并

df <- merge(df1,df2,by=c("Entrez.ID"),all.x=TRUE)

df我尝试从dplyr解决方案中获得完整的加入,它成功了-谢谢!
library(zoo)
df$Ensembl.ID <- with(df, ave(Ensembl.ID, Entrez.ID, FUN = function(x)
        na.locf(na.locf(x, na.rm = FALSE), fromLast = TRUE)))
df$Ensembl.ID
#[1] "ENSG00000113552" "ENSG00000113552" "ENSG00000249915" 
#[4] "ENSG00000249915" "ENSG00000095370"
#[6] "ENSG00000095370"
library(tidyverse)
full_join(df1, df2, by = c("Entrez.ID","Gene.ID")) %>%
    group_by(Entrez.ID) %>%
    fill(Ensembl.ID, .direction = 'up') %>%
    fill(Ensembl.ID, .direction = 'down')
# A tibble: 6 x 3
# Groups:   Entrez.ID [3]
#  Entrez.ID Gene.ID Ensembl.ID     
#      <int> <chr>   <chr>          
#1     10007 GNPDA1  ENSG00000113552
#2     10007 GPI     ENSG00000113552
#3     10016 ALG2    ENSG00000249915
#4     10016 PDCD6   ENSG00000249915
#5     10044 SH2D3C  ENSG00000095370
#6     10044 CHAT    ENSG00000095370
df <- structure(list(Entrez.ID = c(10007L, 10007L, 10016L, 10016L, 
10044L, 10044L), Gene.ID = c("GNPDA1", "GPI", "ALG2", "PDCD6", 
"SH2D3C", "CHAT"), Ensembl.ID = c(NA, "ENSG00000113552", NA, 
"ENSG00000249915", NA, "ENSG00000095370")), class = "data.frame", 
 row.names = c(NA, -6L))
df <- merge(df1,df2,by=c("Entrez.ID"),all.x=TRUE)
df <- rbind(df, df2)