R 使用模糊合并合并两个数据帧_R_Merge

R 使用模糊合并合并两个数据帧

r merge

R 使用模糊合并合并两个数据帧,r,merge,R,Merge,我有两个必须合并的数据帧。两个数据帧中都有一列，我想在该列上合并这两个数据帧。但这两列中的数据并不相似。这两个数据帧中的键列的长度为12位，另一个数据帧的长度为5-6位。我希望在第二个数据帧中类似的5-6位数的基础上进行合并我的数据帧： df1 = data.frame(CustomerId = c(987689000000,786581000000,765909000000,565400000000,746541000000,516890000000), Product = c(rep("T

我有两个必须合并的数据帧。两个数据帧中都有一列，我想在该列上合并这两个数据帧。但这两列中的数据并不相似。这两个数据帧中的键列的长度为12位，另一个数据帧的长度为5-6位。我希望在第二个数据帧中类似的5-6位数的基础上进行合并

我的数据帧：

df1 = data.frame(CustomerId = c(987689000000,786581000000,765909000000,565400000000,746541000000,516890000000), Product = c(rep("Toaster", 3), rep("Radio", 3)))   

df2 = data.frame(customerId = c(987689,986581,7659090,56540,74651,5168900), State = c(rep("Alabama", 2), rep("Ohio", 1)))

我尝试了

c=merge（df1，df2，key=（“CustomerId”），all=TRUE）

我的预期输出如下：-

  CustomerId  Product    State
1  987689     Toaster     Alabama
2  786581     Toaster      Alabama
3  7659090    Toaster      Alabama
4  56540       Radio      Alabama
5  74651       Radio      Alabama
6  516890     Radio        Alabama

这里有一个解决方案。关键是使用

formatC

调整数字格式，并使用

stru extract

提取匹配的部分。完成此步骤后，您可以确定是否要使用

左连接

、

右连接

或

内连接

来保留数据帧的哪一部分

df3

是最终输出

请注意，您提供的示例包含不匹配的ID，因此根据您提供的数据帧，不可能重现所需的输出

# Load packages
library(dplyr)
library(stringr)
library(rebus)

# Process the data
df3 <- df1 %>%
  # Use str_extract to get CustomerId matched in df2
  mutate(CustomerId = str_extract(string = formatC(CustomerId, 
                                                   digits = 0, 
                                                   format = "f"), 
                                  pattern = or1(df2$customerId))) %>%
  # Join with df2 by the updated CustomerId
  right_join(df2 %>% 
               mutate(CustomerId = as.character(customerId)) %>%
               select(-customerId), 
            by = "CustomerId")

# View the result
df3
#  CustomerId Product   State
#1     987689 Toaster Alabama
#2     986581    <NA> Alabama
#3    7659090 Toaster    Ohio
#4      56540   Radio Alabama
#5      74651    <NA> Alabama
#6    5168900   Radio    Ohio

#加载包
图书馆（dplyr）
图书馆（stringr）
图书馆（REBS）
#处理数据
df3%
#使用str_extract在df2中获得匹配的CustomerId
mutate（CustomerId=str_）extract（string=formatC（CustomerId，
数字=0，
format=“f”），
模式=or1（df2$customerId）））%>%
#通过更新的CustomerId加入df2
右联合（df2%>%
变异（CustomerId=as.character（CustomerId））%>%
选择（-customerId），
by=“CustomerId”）
#查看结果
df3
#客户ID产品状态
#1 987689烤面包机阿拉巴马州
#2986581阿拉巴马州
#俄亥俄州7659090烤面包机
#4阿拉巴马州电台56540
#阿拉巴马州574651
#俄亥俄州电台6 5168900

我很乐意帮忙。如果此答案有帮助，请单击左上角的绿色标记接受。