R 比较3个数据帧中的值并追加缺少的值

R 比较3个数据帧中的值并追加缺少的值,r,dataframe,R,Dataframe,我有3个数据帧 Data1 - Name_description Numbers ABC 23 DEF 34 GHI 45 XYZ 43 JVK 23 LMN 21 数据2只有一个名称列表 Data 2- Names ABC DE

我有3个数据帧

Data1 -
Name_description   Numbers 
ABC                23
DEF                34
GHI                45
XYZ                43
JVK                23
LMN                21
数据2只有一个名称列表

Data 2- 
    Names            
    ABC                
    DEF                
    GHI                
    XYZ                
    JVK                
    LMN    
    PQR
    KJL      
数据3同样有名称和数字

Data 3
Name_desc           Numbers 
    ABC                56
    DEF                67
    GHI                89
    XYZ                60
    JVK                88
    LMN                65
    PQR                100
    KJL                85
我想做以下事情-

Look for all names from data 2 are present in data 1
If any names are missing then 
{
get those names
get the numbers for those missing names from data 3
append above two things (missing names & numbers) to data 1
}
else
{data1<-data1
}
谢谢

首先,合并数据1和数据2,然后在此新的data.frame中定位NA并将其与数据3匹配,最后,用数据3值替换它们


使用dplyr,它应该看起来像:

data1 %>% 
    bind_rows(
        data2 %>% 
        anti_join(data1) %>% 
        left_join(data3)
    )  
我们可以通过使用left_join和ifelse在dplyr中实现这一点

资料

我发现dplyr::coalesce在OP提到的情况下非常方便。加入3个数据帧后,2个数字列将可用,其中一个包含NA,可以使用coalesce合并为:

数据:

实际上,我们根本不需要合并,您想要的是使用第一个可用的数字选项,从Data1开始,然后是Data3,我假设当Name在Data2中,而不是在others中时返回NA

最快的方法是使用data.table,但我也会给出其他选项

数据表

默认情况下,data.table::rbindlist不使用名称use.names=FALSE,因此在这种情况下非常方便

library(data.table)
rbindlist(list(Data1,Data3,Data2))[,.SD[1,],by="Name_description"]

# 1:              ABC      23
# 2:              DEF      34
# 3:              GHI      45
# 4:              XYZ      43
# 5:              JVK      23
# 6:              LMN      21
# 7:              PQR     100
# 8:              KJL      85
tidyverse溶液

dplyr::distinct的.keep_all参数有助于避免使用可读性较差的%>%过滤器!重复的名称或%>%group\u byNames%>%1

基本溶液


谢谢,我尝试了Merge&它运行得很好,但想看看是否没有丢失值,然后如何处理它?这就是为什么我一直在寻找类似ifelse的东西。@Earthshaker在matchtmp$Name\u description[is.natmp$Numbers],Data3$Name\u desc part中处理。
data1 %>% 
    bind_rows(
        data2 %>% 
        anti_join(data1) %>% 
        left_join(data3)
    )  
library(dplyr)

Data4 <- Data2 %>%
  left_join(Data1, by = c("Names" = "Name_description")) %>%
  left_join(Data3, by = c("Names" = "Name_desc")) %>%
  mutate(Numbers = ifelse(is.na(Numbers.x), Numbers.y, Numbers.x)) %>%
  select(Names, Numbers)
Data4
#    Names Numbers
# 1   ABC      23
# 2   DEF      34
# 3   GHI      45
# 4   XYZ      43
# 5   JVK      23
# 6   LMN      21
# 7   PQR     100
# 8   KJL      85
Data1 <- read.table(text = "Name_description   Numbers 
ABC                23
DEF                34
GHI                45
XYZ                43
JVK                23
LMN                21",
                    header = TRUE, stringsAsFactors = FALSE)

Data2 <- read.table(text = "Names            
    ABC                
    DEF                
    GHI                
    XYZ                
    JVK                
    LMN    
    PQR
    KJL",
                    header = TRUE, stringsAsFactors = FALSE)

Data3 <- read.table(text = "Name_desc           Numbers 
    ABC                56
    DEF                67
    GHI                89
    XYZ                60
    JVK                88
    LMN                65
    PQR                100
    KJL                85",
                    header = TRUE, stringsAsFactors = FALSE)
library(dplyr)

Data1 %>% full_join(Data2, by=c("Name_description" = "Names")) %>%
  inner_join(Data3, by=c("Name_description" = "Name_desc")) %>%
  mutate(Numbers = coalesce( Numbers.x, Numbers.y)) %>%
  select(Name_description, Numbers)

#    Name_description Numbers
# 1              ABC      23
# 2              DEF      34
# 3              GHI      45
# 4              XYZ      43
# 5              JVK      23
# 6              LMN      21
# 7              PQR     100
# 8              KJL      85
Data1 <- read.table(text = 
"Name_description   Numbers 
ABC                23
DEF                34
GHI                45
XYZ                43
JVK                23
LMN                21",
header = TRUE, stringsAsFactors = FALSE)

Data2 <- read.table(text = 
"Names            
ABC                
DEF                
GHI                
XYZ                
JVK                
LMN    
PQR
KJL",
header = TRUE, stringsAsFactors = FALSE)


Data3 <- read.table(text = 
"Name_desc           Numbers 
ABC                56
DEF                67
GHI                89
XYZ                60
JVK                88
LMN                65
PQR                100
KJL                85",
header = TRUE, stringsAsFactors = FALSE)
library(data.table)
rbindlist(list(Data1,Data3,Data2))[,.SD[1,],by="Name_description"]

# 1:              ABC      23
# 2:              DEF      34
# 3:              GHI      45
# 4:              XYZ      43
# 5:              JVK      23
# 6:              LMN      21
# 7:              PQR     100
# 8:              KJL      85
library(tidyverse)
lst(Data1,Data3,cbind(Data2,NA)) %>%
  map(setNames,c("Names","Numbers")) %>%
  bind_rows %>%
  distinct(Names,.keep_all = TRUE) 

# Names Numbers
# 1   ABC      23
# 2   DEF      34
# 3   GHI      45
# 4   XYZ      43
# 5   JVK      23
# 6   LMN      21
# 7   PQR     100
# 8   KJL      85
x <- do.call(rbind,lapply(list(Data1,Data3,cbind(Data2,NA)),setNames,c("Names","Numbers")))
x[!duplicated(x[[1]]),]  
#    Names Numbers
# 1    ABC      23
# 2    DEF      34
# 3    GHI      45
# 4    XYZ      43
# 5    JVK      23
# 6    LMN      21
# 13   PQR     100
# 14   KJL      85