R：基于两列的部分字符串匹配来构造伪列_R_Dataframe_Multiple Columns

R：基于两列的部分字符串匹配来构造伪列

r dataframe

R：基于两列的部分字符串匹配来构造伪列,r,dataframe,multiple-columns,R,Dataframe,Multiple Columns,我有一个数据帧df1，其中包含按ID进行采集的信息。每个收单机构a和目标机构B的一行上都有四位SIC代码，代码之间用/隔开 df1 <- data.frame(ID = c(1,2,3,4), A = c("1230/1344/2334/2334","3322/3344/3443", "1112/9099", "3332/4483"), B = c("1333/2334","3344/8840", "4454", "9988/2221

我有一个数据帧df1，其中包含按ID进行采集的信息。每个收单机构a和目标机构B的一行上都有四位SIC代码，代码之间用/隔开

df1 <- data.frame(ID = c(1,2,3,4),
              A = c("1230/1344/2334/2334","3322/3344/3443", "1112/9099", "3332/4483"),
              B = c("1333/2334","3344/8840", "4454", "9988/2221/4483"))

  ID                   A              B
   1 1230/1344/2334/2334      1333/2334
   2      3322/3344/3443      3344/8840
   3           1112/9099           4454
   4           3332/4483 9988/2221/4483

我需要将每个交易ID分类如下：

如果主代码（即A或B的前四位数字）与B或A的主代码以外的任何其他代码匹配，则primary.other.match列的值为1，否则为0。如果A或B的主代码与B或A的主代码以外的任何其他代码匹配，则other.other.match列的值为1和0。更新后的df1中显示了所需的输出

df1 <- data.frame(ID = c(1,2,3,4),
              A = c("1230/1344/2334/2334","3322/3344/3443", "1112/9099", "3332/4483"),
              B = c("1333/2334","3344/8840", "4454", "9988/2221/4483"),
              Primary.other.match = c(0,1,0,0), #only if primary Code of A or B matches 
any other code of B or A
              Other.other.match = c(1,0,0,1)) # only if primary codes do not match 
primary or any other codes, but any other codes match
ID                   A              B Primary.other.match Other.other.match
 1 1230/1344/2334/2334      1333/2334                   0                 1
 2      3322/3344/3443      3344/8840                   1                 0
 3           1112/9099           4454                   0                 0
 4           3332/4483 9988/2221/4483                   0                 1

谢谢你的帮助

这是tidyverse中的一个解决方案

首先创建一个函数，用于检查是否存在主匹配或其他匹配，然后使用purrr:：map按列应用此函数：

非常感谢。但是，我得到了错误代码：mutate_impl.data中出错，dots:找不到函数str_split。有什么想法吗？检查您是否安装了tidyverse软件包的最新版本。如果仍然存在错误，请尝试编写stringr:：str_split。否则，您可以使用strsplitstr1，/from base R。谢谢，tidyverse软件包似乎不是最新的。

library(tidyverse)

fun1 <- function(str1, str2){
 str1 <- str1 %>% str_split("/") %>% unlist()
 str2 <- str2 %>% str_split("/") %>% unlist()

 str1p <- str1[1]
 str2p <- str2[1]

 pom <- ifelse(str1p %in% str2 | str2p %in% str1, 1, 0)
 oom <- ifelse(pom == 0 & length(intersect(str1, str2)) > 0, 1, 0)

 tibble(pom = pom, oom = oom)

}

df1 %>% as_tibble() %>% 
  mutate(result = map2(A, B, fun1)) %>% 
  unnest(result)

# A tibble: 4 x 5
     ID A                   B                pom   oom
  <dbl> <fct>               <fct>          <dbl> <dbl>
1     1 1230/1344/2334/2334 1333/2334          0     1
2     2 3322/3344/3443      3344/8840          1     0
3     3 1112/9099           4454               0     0
4     4 3332/4483           9988/2221/4483     0     1