R 比较四个具有公差间隔的数字向量,并报告常用值

R 比较四个具有公差间隔的数字向量,并报告常用值,r,data.table,compare,threshold,set-difference,R,Data.table,Compare,Threshold,Set Difference,我有四个长度不等的大向量。下面我提供了一个与我的原始数据集类似的玩具数据集: a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1) b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662) c <- c(52,21,1021.9288,12019.12, 879.1) d <- c(432.432,23466.3,45435,3

我有四个长度不等的大向量。下面我提供了一个与我的原始数据集类似的玩具数据集:

a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1)
b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662)
c <- c(52,21,1021.9288,12019.12, 879.1)
d <- c(432.432,23466.3,45435,3456,123,6688,1021.95)
我知道这对于两个向量是可能的,但是对于4个向量我怎么做呢

相关的


    • 这是一个data.table解决方案

      它可以扩展到n个向量,所以请尝试按您喜欢的方式馈送它。。当多个值在所有向量中都有“命中”时,它的性能也很好

      样本数据

      a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1)
      b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662)
      c <- c(52,21,1021.9288,12019.12, 879.1)
      d <- c(432.432,23466.3,45435,3456,123,6688,1021.95)
      

      编写一个函数来处理两个向量,将其称为
      my_compare
      ,然后
      Reduce(my_compare,list(a,b,c,d))
      尽管存在潜在的问题,例如,如果
      a
      100
      b
      100.4
      c
      99.6
      ,这取决于您希望如何处理。在不同的订单上运行或制作
      my_compare
      accept和return等同物列表都可以。
      a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1)
      b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662)
      c <- c(52,21,1021.9288,12019.12, 879.1)
      d <- c(432.432,23466.3,45435,3456,123,6688,1021.95)
      
      library(data.table)
      
      #create list with vectors
      l <- list( a,b,c,d )
      names(l) <- letters[1:4]
      #create data.table to work with
      DT <- rbindlist( lapply(l, function(x) {data.table( value = x)} ), idcol = "group")
      #add margins to each value
      DT[, `:=`( id = 1:.N, start = value - 0.5, end = value + 0.5 ) ]
      #set keys for joining
      setkey(DT, start, end)
      #perform overlap-join
      result <- foverlaps(DT,DT)
      
      #cast, to check how the 'hits' each id has in each group (a,b,c,d)
      answer <- dcast( result, 
                   group + value ~ i.group, 
                   fun.aggregate = function(x){ x * 1 }, 
                   value.var = "i.value", 
                   fill = NA )
      
      #get your final answer
      #set columns to look at (i.e. the names from the earlier created list)
      cols = names(l)
      #keep the rows without NA (use rowSums, because TRUE = 1, FALSE = 0 )
      #so if rowSums == 0, then columns in the vactor 'cols' do not contain a 'NA'
      answer[ rowSums( is.na( answer[ , ..cols ] ) ) == 0, ]
      
      #    group    value        a      b        c       d
      # 1:     a 1021.923 1021.923 1021.9 1021.929 1021.95
      # 2:     b 1021.900 1021.923 1021.9 1021.929 1021.95
      # 3:     c 1021.929 1021.923 1021.9 1021.929 1021.95
      # 4:     d 1021.950 1021.923 1021.9 1021.929 1021.95