R 当A列值匹配时,用B列值替换B列NAs
我试图基于两个变量(Entrez.ID和Gene.ID)合并两个数据帧。一个数据帧只有那些变量,例如R 当A列值匹配时,用B列值替换B列NAs,r,merge,pattern-matching,na,R,Merge,Pattern Matching,Na,我试图基于两个变量(Entrez.ID和Gene.ID)合并两个数据帧。一个数据帧只有那些变量,例如 Entrez.ID Gene.ID 10007 GNPDA1 10016 ALG2 10044 SH2D3C 以及一个具有三个变量的数据帧,例如 Entrez.ID Gene.ID Ensembl.ID 10007 GPI ENSG00000113552 10016 PDCD6 ENSG00000249
Entrez.ID Gene.ID
10007 GNPDA1
10016 ALG2
10044 SH2D3C
以及一个具有三个变量的数据帧,例如
Entrez.ID Gene.ID Ensembl.ID
10007 GPI ENSG00000113552
10016 PDCD6 ENSG00000249915
10044 CHAT ENSG00000095370
当前,当我使用以下命令合并文件时:
df<-merge(df1,df2,by=c("Entrez.ID","Gene.ID"),all=TRUE)
我如何告诉R在Entrez.ID变量匹配的地方,我希望Ensembl.ID匹配(即在可用的情况下用Ensembl.ID替换NA)?我们可以使用
NA.locf
fromzoo
library(zoo)
df$Ensembl.ID <- with(df, ave(Ensembl.ID, Entrez.ID, FUN = function(x)
na.locf(na.locf(x, na.rm = FALSE), fromLast = TRUE)))
df$Ensembl.ID
#[1] "ENSG00000113552" "ENSG00000113552" "ENSG00000249915"
#[4] "ENSG00000249915" "ENSG00000095370"
#[6] "ENSG00000095370"
数据
df如果df1的Gene.ID和df2的Gene.ID是唯一的(df1中的ID不是df2中的ID),您可以简单地合并
df <- merge(df1,df2,by=c("Entrez.ID"),all.x=TRUE)
df我尝试从dplyr解决方案中获得完整的加入,它成功了-谢谢!
library(zoo)
df$Ensembl.ID <- with(df, ave(Ensembl.ID, Entrez.ID, FUN = function(x)
na.locf(na.locf(x, na.rm = FALSE), fromLast = TRUE)))
df$Ensembl.ID
#[1] "ENSG00000113552" "ENSG00000113552" "ENSG00000249915"
#[4] "ENSG00000249915" "ENSG00000095370"
#[6] "ENSG00000095370"
library(tidyverse)
full_join(df1, df2, by = c("Entrez.ID","Gene.ID")) %>%
group_by(Entrez.ID) %>%
fill(Ensembl.ID, .direction = 'up') %>%
fill(Ensembl.ID, .direction = 'down')
# A tibble: 6 x 3
# Groups: Entrez.ID [3]
# Entrez.ID Gene.ID Ensembl.ID
# <int> <chr> <chr>
#1 10007 GNPDA1 ENSG00000113552
#2 10007 GPI ENSG00000113552
#3 10016 ALG2 ENSG00000249915
#4 10016 PDCD6 ENSG00000249915
#5 10044 SH2D3C ENSG00000095370
#6 10044 CHAT ENSG00000095370
df <- structure(list(Entrez.ID = c(10007L, 10007L, 10016L, 10016L,
10044L, 10044L), Gene.ID = c("GNPDA1", "GPI", "ALG2", "PDCD6",
"SH2D3C", "CHAT"), Ensembl.ID = c(NA, "ENSG00000113552", NA,
"ENSG00000249915", NA, "ENSG00000095370")), class = "data.frame",
row.names = c(NA, -6L))
df <- merge(df1,df2,by=c("Entrez.ID"),all.x=TRUE)
df <- rbind(df, df2)