Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/gwt/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 检查元素是否包含在不同长度的不相等元素上_R_List_Vector_Apply_Mapply - Fatal编程技术网

R 检查元素是否包含在不同长度的不相等元素上

R 检查元素是否包含在不同长度的不相等元素上,r,list,vector,apply,mapply,R,List,Vector,Apply,Mapply,我试图找出一个字符向量的一部分是否与另一个字符向量的一部分重叠 x <- c("OCT/NOV/DEC", "JAN/DEC/AUG") y <- c("JAN/FEB/MAR", "APR/MAY/JUN", "JUL/AUG/SEP") # Months should be split into separate characters So I would use: list

我试图找出一个字符向量的一部分是否与另一个字符向量的一部分重叠

x <- c("OCT/NOV/DEC", "JAN/DEC/AUG")
y <- c("JAN/FEB/MAR", "APR/MAY/JUN", "JUL/AUG/SEP")

# Months should be split into separate characters

So I would use:

list_x <- strsplit(x, '/')
list_x

#> [[1]]
#> [1] "OCT" "NOV" "DEC"
#> 
#> [[2]]
#> [1] "JAN" "DEC" "AUG"

list_y <- strsplit(y, '/')
list_y

#> [[1]]
#> [1] "JAN" "FEB" "MAR"
#> 
#> [[2]]
#> [1] "APR" "MAY" "JUN"
#> 
#> [[3]]
#> [1] "JUL" "AUG" "SEP"

x
#> [[2]]
#>[1]“一月”“十二月”“八月”
列表y[[1]]
#>[1]“一月”“二月”“三月”
#> 
#> [[2]]
#>[1]“四月”“五月”“六月”
#> 
#> [[3]]
#>[1]“七月”“八月”“九月”
正如我们所看到的,list_x[[1]]在任何list_y中都没有元素,因此应该返回FALSE

list_x[[2]]有“JAN”和“AUG”,分别位于list_y[[1]]和list_y[[3]]中,因此应该返回TRUE

# The response should be 

c(FALSE, TRUE) # for each of x elements

# I tried:

detect <- function(x, y){ 
  mapply(function(x, y) any(x %in% y), strsplit(x, '/'), strsplit(y, '/'))
}

detect(x,y)

# Which gives a warning stating the lengths are not multiple and:
#> [1] FALSE FALSE FALSE

#响应应该是
c(假、真)#对于每个x元素
#我试过:
检测[1]错误
那么,我如何判断是否有x元素也存在于y元素中呢

编辑:在Akrun的回答之后,我尝试了一种更复杂的方法来处理非相等连接

detect <- function(a,b){
  sapply(str_split(a, '/'), function(x) any(sapply(str_split(b, '/'),
                                                   function(y) any(x %in% y))))
}

a <- tibble(a1 = c("A/B/C", "F/E/G"),
            b1 = c(1,2),
            c1 = c("OCT/NOV/DEC", "JAN/DEC/AUG"))

b <- tibble(a2 = c("A/B/C", "D/E/F", "G/H/I"),
            b2 = c(1,2,3),
            c2 = c("JAN/FEB/MAR", "APR/MAY/JUN", "JUL/AUG/SEP"))

fuzzyjoin::fuzzy_left_join(a, b, by = c("a1" = "a2", 
                             "b1" = "b2",
                             "c1" = "c2"),
                match_fun = list(detect, `==`, detect))

## Wrong Result:
#>  a1       b1 c1          a2       b2 c2         
#>  <chr> <int> <chr>       <chr> <int> <chr>      
#> 1 A/B/C     1 OCT/NOV/DEC NA       NA NA         
#> 2 F/E/G     2 JAN/DEC/AUG D/E/F     2 APR/MAY/JUN

# Row 2: Although a1 and a2 have matching characters and b1 matches b2, c1 and c2 have no matching characters, so the join shouldn't be possible

## Expected:
#>  a1       b1 c1          a2       b2 c2         
#>  <chr> <int> <chr>       <chr> <int> <chr>      
#> 1 A/B/C     1 OCT/NOV/DEC NA       NA NA         
#> 2 F/E/G     2 JAN/DEC/AUG NA       NA NA

检测1 A/B/C 1十月/十一月/十二月不适用
#>2楼E/G 2一月/十二月/八月D/E/F 2四月/五月/六月
#第2行:虽然a1和a2有匹配字符,b1匹配b2,c1和c2没有匹配字符,但是连接应该是不可能的
##预期:
#>a1 b1 c1 a2 b2 c2
#>                   
#>1 A/B/C 1十月/十一月/十二月不适用
#>2 F/E/G 2一月/十二月/八月不适用

也许我误解了这个函数中的某些内容?

我们可以将嵌套的
sapply
any

sapply(list_x, function(x) any(sapply(list_y, function(y) any(x %in% y))))
#[1] FALSE  TRUE

对于更新的数据,如果我们将
any
更改为
all
,它将给出预期的输出

detect <- function(a,b){
   sapply(str_split(a, '/'), function(x) all(sapply(str_split(b, '/'),
                                                    function(y) any(x %in% y))))
 }
 fuzzyjoin::fuzzy_left_join(a, b, by = c("a1" = "a2", 
                              "b1" = "b2",
                              "c1" = "c2"),
                 match_fun = list(detect, `==`, detect))
# A tibble: 2 x 6
#  a1       b1 c1          a2       b2 c2   
#  <chr> <dbl> <chr>       <chr> <dbl> <chr>
#1 A/B/C     1 OCT/NOV/DEC <NA>     NA <NA> 
#2 F/E/G     2 JAN/DEC/AUG <NA>     NA <NA> 

detect@guillhermes当然,我在检查你的新数据。这似乎是一种完全不同的结构,因为它将函数包装在
match\u fun
@guilhermacampos中。如果您以成对的方式应用代码,
Map(detect,a,b)
,您将得到逻辑向量作为logicals@GuilhermeCampos
Map(detect,a,b)#$a1[1]TRUE$b1[1]TRUE TRUE$c1[1]FALSE TRUE
。从该输出中不清楚您希望如何获得预期的as all
NA
s@GuilhermeCampos你能告诉我为什么它不是预期的,因为
Map
输出与此一致吗value@GuilhermeCampos我认为这里的连接是基于感兴趣的列之间存在对应匹配的时间