R 如果一个数据帧的元素与另一个数据帧匹配,如何替换这些元素,同时保留不匹配的元素?
如果数据框中的某些元素与另一个数据框中的元素匹配,我将尝试替换这些元素 df1: df2: 我想要的输出是:R 如果一个数据帧的元素与另一个数据帧匹配,如何替换这些元素,同时保留不匹配的元素?,r,dataframe,match,R,Dataframe,Match,如果数据框中的某些元素与另一个数据框中的元素匹配,我将尝试替换这些元素 df1: df2: 我想要的输出是: V1 V2 V3 10 4FP3428 JP_00267-008 Line 11 4FP5103 JP_00302-049 Line 12 4FP3188 4FP3188 Line 13 4FP4137 JP_00284-029 Line 14 4FP3465 JP_00268-005 Line 15 4FP3367 JP_00265-057 L
V1 V2 V3
10 4FP3428 JP_00267-008 Line
11 4FP5103 JP_00302-049 Line
12 4FP3188 4FP3188 Line
13 4FP4137 JP_00284-029 Line
14 4FP3465 JP_00268-005 Line
15 4FP3367 JP_00265-057 Line
16 4FP4245 JP_00286-010 Line
17 4FP4085 JP_00283-008 Line
18 4PP3992 JP_00330-298 Line
19 4FP3575 JP_00269-035 Line
20 4FP4963 JP_00300-106 Line
但我得到的是:
V1 V2 V3
10 4FP3428 JP_00267-008 Line
11 4FP5103 JP_00302-049 Line
12 <NA> 4FP3188 Line
13 4FP4137 JP_00284-029 Line
14 4FP3465 JP_00268-005 Line
15 4FP3367 JP_00265-057 Line
16 4FP4245 JP_00286-010 Line
17 4FP4085 JP_00283-008 Line
18 4PP3992 JP_00330-298 Line
19 4FP3575 JP_00269-035 Line
20 4FP4963 JP_00300-106 Line
V1 V2 V3
10 4FP3428 JP_00267-008线路
11 4FP5103 JP_00302-049线路
12 4FP3188线路
13 4FP4137 JP_00284-029线路
14 4FP3465 JP_00268-005线路
15 4FP3367 JP_00265-057线路
16 4FP4245 JP_00286-010生产线
17 4FP4085 JP_00283-008线路
18 4PP3992 JP_00330-298线路
19 4FP3575 JP_00269-035线路
20 4FP4963 JP_00300-106线路
这是我使用的代码:
df1[,1] <- df2[match(as.character(unlist(df1[,1])), as.character(df2[[1]])), 2]
df1[,1]如果您想坚持使用base R,请使用
# an index which includes missing values
idx <- match(as.character(unlist(df1[,1])), as.character(df2[[1]]))
# an index of the non-missing values in `idx`
idx_not_missing <- !is.na(idx)
# push the data only when the index `idx` is not missing
df1[idx_not_missing,1] <- df2[idx[idx_not_missing], 2]
#包含缺失值的索引
idx如果你想坚持使用base R,使用
# an index which includes missing values
idx <- match(as.character(unlist(df1[,1])), as.character(df2[[1]]))
# an index of the non-missing values in `idx`
idx_not_missing <- !is.na(idx)
# push the data only when the index `idx` is not missing
df1[idx_not_missing,1] <- df2[idx[idx_not_missing], 2]
#包含缺失值的索引
idx这里有一个使用data.table的选项
library(data.table)
setkey(setDT(df1), V1)[df2, V1:=i.V2][]
# V1 V2 V3
# 1: 4FP3188 4FP3188 Line
#2: 4FP3367 JP_00265-057 Line
#3: 4FP3428 JP_00267-008 Line
#4: 4FP3465 JP_00268-005 Line
#5: 4FP3575 JP_00269-035 Line
#6: 4FP4085 JP_00283-008 Line
#7: 4FP4137 JP_00284-029 Line
#8: 4FP4245 JP_00286-010 Line
#9: 4FP4963 JP_00300-106 Line
#10: 4FP5103 JP_00302-049 Line
#11: 4PP3992 JP_00330-298 Line
或者使用dplyr
library(dplyr)
left_join(df1, df2, by='V1') %>%
mutate(V2.y= ifelse(is.na(V2.y), V1, V2.y)) %>%
select(-V1) %>%
rename(V1=V2.y, V2=V2.x)
# V2 V3 V1
#1 JP_00267-008 Line 4FP3428
#2 JP_00302-049 Line 4FP5103
#3 4FP3188 Line 4FP3188
#4 JP_00284-029 Line 4FP4137
#5 JP_00268-005 Line 4FP3465
#6 JP_00265-057 Line 4FP3367
#7 JP_00286-010 Line 4FP4245
#8 JP_00283-008 Line 4FP4085
#9 JP_00330-298 Line 4PP3992
#10 JP_00269-035 Line 4FP3575
#11 JP_00300-106 Line 4FP4963
这里有一个使用数据的选项。表
library(data.table)
setkey(setDT(df1), V1)[df2, V1:=i.V2][]
# V1 V2 V3
# 1: 4FP3188 4FP3188 Line
#2: 4FP3367 JP_00265-057 Line
#3: 4FP3428 JP_00267-008 Line
#4: 4FP3465 JP_00268-005 Line
#5: 4FP3575 JP_00269-035 Line
#6: 4FP4085 JP_00283-008 Line
#7: 4FP4137 JP_00284-029 Line
#8: 4FP4245 JP_00286-010 Line
#9: 4FP4963 JP_00300-106 Line
#10: 4FP5103 JP_00302-049 Line
#11: 4PP3992 JP_00330-298 Line
或者使用dplyr
library(dplyr)
left_join(df1, df2, by='V1') %>%
mutate(V2.y= ifelse(is.na(V2.y), V1, V2.y)) %>%
select(-V1) %>%
rename(V1=V2.y, V2=V2.x)
# V2 V3 V1
#1 JP_00267-008 Line 4FP3428
#2 JP_00302-049 Line 4FP5103
#3 4FP3188 Line 4FP3188
#4 JP_00284-029 Line 4FP4137
#5 JP_00268-005 Line 4FP3465
#6 JP_00265-057 Line 4FP3367
#7 JP_00286-010 Line 4FP4245
#8 JP_00283-008 Line 4FP4085
#9 JP_00330-298 Line 4PP3992
#10 JP_00269-035 Line 4FP3575
#11 JP_00300-106 Line 4FP4963
这不管用!它给出了警告消息:在
[好的,我发现问题是该列不是字符。现在它工作了。谢谢这不起作用!它给出了警告消息:在
[好的,我发现问题是该列不是字符。现在它工作了。谢谢