Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何查找data.table中包含查询向量最匹配项的行_R_Data.table_Bioinformatics - Fatal编程技术网

R 如何查找data.table中包含查询向量最匹配项的行

R 如何查找data.table中包含查询向量最匹配项的行,r,data.table,bioinformatics,R,Data.table,Bioinformatics,我有一个数据表 library(data.table) ffDummy_dt = data.table(Annotation=c("chr10:10..20,-", "chr10:25..30,-" ,"chr10:35..100,-","chr10:106..205,-","chr10:223..250,-","chr10:269..478,-" ,"chr10:699..1001,-","chr10:2000..2210,-","chr10:2300..2500,-" ,"chr10:267

我有一个数据表

library(data.table)
ffDummy_dt = data.table(Annotation=c("chr10:10..20,-", "chr10:25..30,-"
,"chr10:35..100,-","chr10:106..205,-","chr10:223..250,-","chr10:269..478,-"
,"chr10:699..1001,-","chr10:2000..2210,-","chr10:2300..2500,-"
,"chr10:2678..5678,-"),tpmOne=c(0,0,0.213,1,1.2,0.5,0.7,0.9,0.8,0.86), 
tpmTwo=c(100,1000,1001,1500,900,877,1212,1232,1312,0),tpmThree=c(0.2138595,0,0,0
,0,0,0.6415786,0,0,0))
我想传递一个查询(可以是vector,如果需要,甚至可以是data.table),如下所示:

我想找出哪一排最匹配

在我的实际用例中,test_v有20个元素长,nrow(Dummy_dt)大于20(但很可能每20个元素向量只有一个完美匹配)

目前,

which.max(apply(as.matrix(ffDummy_dt[,2:ncol(ffDummy_dt),with=F]), 1, 
  function(k) sum(test_v%in%k)))
似乎有效(在本例中给出正确的输出,即10),但这不是data.table解决方案


我已经看过了,但不太清楚如何在上面的%k中将
%与data.table一起使用。

假设您确实希望匹配是独占的(在我看来,行是“最佳匹配”更有意义),您可以:

Reduce(`+`, lapply(ffDummy_dt, `%in%`, test_v))
#[1] 1 2 1 1 1 1 0 1 1 3

你是说
test\u v
中元素的顺序没有区别?如果是这样的话,那就是一个混乱的问题。这就是我要说的。好吧,我想你会遇到很多困难。首先,尝试
.1+.2==.3
,然后如果您正在查找整数、字符串或其他内容,则可能会读取,但这是可行的。因此匹配不是独占的,您真的希望第一行的匹配数为2吗?
Reduce(`+`, lapply(ffDummy_dt, `%in%`, test_v))
#[1] 1 2 1 1 1 1 0 1 1 3