R 比较3个数据帧中的值并追加缺少的值
我有3个数据帧R 比较3个数据帧中的值并追加缺少的值,r,dataframe,R,Dataframe,我有3个数据帧 Data1 - Name_description Numbers ABC 23 DEF 34 GHI 45 XYZ 43 JVK 23 LMN 21 数据2只有一个名称列表 Data 2- Names ABC DE
Data1 -
Name_description Numbers
ABC 23
DEF 34
GHI 45
XYZ 43
JVK 23
LMN 21
数据2只有一个名称列表
Data 2-
Names
ABC
DEF
GHI
XYZ
JVK
LMN
PQR
KJL
数据3同样有名称和数字
Data 3
Name_desc Numbers
ABC 56
DEF 67
GHI 89
XYZ 60
JVK 88
LMN 65
PQR 100
KJL 85
我想做以下事情-
Look for all names from data 2 are present in data 1
If any names are missing then
{
get those names
get the numbers for those missing names from data 3
append above two things (missing names & numbers) to data 1
}
else
{data1<-data1
}
谢谢首先,合并数据1和数据2,然后在此新的data.frame中定位NA并将其与数据3匹配,最后,用数据3值替换它们
使用dplyr,它应该看起来像:
data1 %>%
bind_rows(
data2 %>%
anti_join(data1) %>%
left_join(data3)
)
我们可以通过使用left_join和ifelse在dplyr中实现这一点
资料
我发现dplyr::coalesce在OP提到的情况下非常方便。加入3个数据帧后,2个数字列将可用,其中一个包含NA,可以使用coalesce合并为:
数据:
实际上,我们根本不需要合并,您想要的是使用第一个可用的数字选项,从Data1开始,然后是Data3,我假设当Name在Data2中,而不是在others中时返回NA
最快的方法是使用data.table,但我也会给出其他选项
数据表
默认情况下,data.table::rbindlist不使用名称use.names=FALSE,因此在这种情况下非常方便
library(data.table)
rbindlist(list(Data1,Data3,Data2))[,.SD[1,],by="Name_description"]
# 1: ABC 23
# 2: DEF 34
# 3: GHI 45
# 4: XYZ 43
# 5: JVK 23
# 6: LMN 21
# 7: PQR 100
# 8: KJL 85
tidyverse溶液
dplyr::distinct的.keep_all参数有助于避免使用可读性较差的%>%过滤器!重复的名称或%>%group\u byNames%>%1
基本溶液
谢谢,我尝试了Merge&它运行得很好,但想看看是否没有丢失值,然后如何处理它?这就是为什么我一直在寻找类似ifelse的东西。@Earthshaker在matchtmp$Name\u description[is.natmp$Numbers],Data3$Name\u desc part中处理。
data1 %>%
bind_rows(
data2 %>%
anti_join(data1) %>%
left_join(data3)
)
library(dplyr)
Data4 <- Data2 %>%
left_join(Data1, by = c("Names" = "Name_description")) %>%
left_join(Data3, by = c("Names" = "Name_desc")) %>%
mutate(Numbers = ifelse(is.na(Numbers.x), Numbers.y, Numbers.x)) %>%
select(Names, Numbers)
Data4
# Names Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 7 PQR 100
# 8 KJL 85
Data1 <- read.table(text = "Name_description Numbers
ABC 23
DEF 34
GHI 45
XYZ 43
JVK 23
LMN 21",
header = TRUE, stringsAsFactors = FALSE)
Data2 <- read.table(text = "Names
ABC
DEF
GHI
XYZ
JVK
LMN
PQR
KJL",
header = TRUE, stringsAsFactors = FALSE)
Data3 <- read.table(text = "Name_desc Numbers
ABC 56
DEF 67
GHI 89
XYZ 60
JVK 88
LMN 65
PQR 100
KJL 85",
header = TRUE, stringsAsFactors = FALSE)
library(dplyr)
Data1 %>% full_join(Data2, by=c("Name_description" = "Names")) %>%
inner_join(Data3, by=c("Name_description" = "Name_desc")) %>%
mutate(Numbers = coalesce( Numbers.x, Numbers.y)) %>%
select(Name_description, Numbers)
# Name_description Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 7 PQR 100
# 8 KJL 85
Data1 <- read.table(text =
"Name_description Numbers
ABC 23
DEF 34
GHI 45
XYZ 43
JVK 23
LMN 21",
header = TRUE, stringsAsFactors = FALSE)
Data2 <- read.table(text =
"Names
ABC
DEF
GHI
XYZ
JVK
LMN
PQR
KJL",
header = TRUE, stringsAsFactors = FALSE)
Data3 <- read.table(text =
"Name_desc Numbers
ABC 56
DEF 67
GHI 89
XYZ 60
JVK 88
LMN 65
PQR 100
KJL 85",
header = TRUE, stringsAsFactors = FALSE)
library(data.table)
rbindlist(list(Data1,Data3,Data2))[,.SD[1,],by="Name_description"]
# 1: ABC 23
# 2: DEF 34
# 3: GHI 45
# 4: XYZ 43
# 5: JVK 23
# 6: LMN 21
# 7: PQR 100
# 8: KJL 85
library(tidyverse)
lst(Data1,Data3,cbind(Data2,NA)) %>%
map(setNames,c("Names","Numbers")) %>%
bind_rows %>%
distinct(Names,.keep_all = TRUE)
# Names Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 7 PQR 100
# 8 KJL 85
x <- do.call(rbind,lapply(list(Data1,Data3,cbind(Data2,NA)),setNames,c("Names","Numbers")))
x[!duplicated(x[[1]]),]
# Names Numbers
# 1 ABC 23
# 2 DEF 34
# 3 GHI 45
# 4 XYZ 43
# 5 JVK 23
# 6 LMN 21
# 13 PQR 100
# 14 KJL 85