R 使用行名称映射和用户定义函数对矩阵进行子集

R 使用行名称映射和用户定义函数对矩阵进行子集,r,matrix,dataframe,data.table,apply,R,Matrix,Dataframe,Data.table,Apply,我有一个矩阵,希望使用映射和函数将其子集 示例:使用runif和set.seed随机填充矩阵以获得再现性 set.seed(1) exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6) rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2') colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s

我有一个矩阵,希望使用映射和函数将其子集

示例:使用
runif
set.seed
随机填充矩阵以获得再现性

set.seed(1)
exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6)
rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2')
colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s6')

exp.mat
         s1       s2       s3       s4       s5       s6
a  5.353395 6.661973 6.733417 8.562573 6.198147 8.024666
b1 5.497331 8.254352 6.668875 6.999972 5.294672 8.273620
b2 6.581359 6.290084 7.381756 6.626761 8.211441 6.765986
b3 7.593171 7.392726 9.460992 8.785436 9.381346 6.351301
c  8.310025 8.831553 9.321697 6.013461 8.894573 9.963420
d1 7.034151 5.421235 6.949948 8.555606 8.986544 8.167466
d2 9.564380 9.376607 8.886603 5.608460 7.276372 6.066041
e1 6.468017 6.695365 9.803090 6.227443 7.050420 5.646862
e2 7.295329 9.197202 7.173297 5.716522 9.054351 7.390590
函数,
mean
用于在存在更多映射时选择行(情况2)

基于这些映射

  • 如果
    rown
    映射到
    map
    中只有一个值,则它应该 直接复制整行。例如:
    a
    c
    只有一个映射
  • 如果
    rown
    映射到
    map
    中有多个值,则 应该从上面的结果函数中复制具有最高值的整行。例如:
    b1
    b2
    b3
    映射到
    b
    <代码>b3具有最高的
    平均值
    。因此,它必须选择
    b3
    ,同样地
    d2
  • 如果
    rown中有一个值
    映射到中的多个值
    map
    然后它应该丢弃这些行。例如:
    e1
    具有多个映射值
    e
    f
  • 如果没有映射,则放弃该行。例如:
    e2
    没有对应的映射
  • 预期产出:子集矩阵

    > exp.mat.trans
            s1       s2       s3       s4       s5       s6
    a 5.353395 6.661973 6.733417 8.562573 6.198147 8.024666
    b 7.593171 7.392726 9.460992 8.785436 9.381346 6.351301
    c 8.310025 8.831553 9.321697 6.013461 8.894573 9.963420
    d 9.564380 9.376607 8.886603 5.608460 7.276372 6.066041
    
    请告知,如何以有效的方式实现这一点

    我已经实现了这种目测和下面的代码

    exp.mat.trans <- exp.mat[c(1,4,5,7),]
    rownames(exp.mat.trans) <- c('a','b','c','d')
    

    exp.mat.trans如果您想要一个有效的解决方案,我认为最好使用data.tables进行映射。如果我运行它,您的输入矩阵会有所不同。我找到了该问题的以下解决方案:

    set.seed(1)
    exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6)
    rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2')
    colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s6')
    > exp.mat
             s1       s2       s3       s4       s5       s6
    a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
    b1 6.860619 6.029873 8.887226 9.348454 5.539718 5.116656
    b2 7.864267 5.882784 9.673526 6.701745 8.618555 7.386150
    b3 9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
    c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
    d1 9.491948 8.849207 5.627775 7.467707 8.235301 7.388098
    d2 9.723376 7.488496 6.336103 5.931088 8.914664 9.306047
    e1 8.303989 8.588093 6.930570 9.136867 7.765182 7.190486
    e2 8.145570 9.959530 5.066952 8.342334 7.648598 6.223986
    maps <- data.table(rown=c('a','b1','b2','b3','c','d1','d2','e1','e1'), 
                       map =c('a','b','b','b','c','d','d','e','f'))
    #RULE 2 calculate mean of each row
    maps[, value := rowMeans(exp.mat)[rown]]
    # aggregate such that we know which mapping should be made (RULE 2)
    maps <- maps[, rown[which.max(value)], by = map]
    # Delete if more mappings are made first find the number of mappings (RULE 3)
    number_map <- maps[,.N, by = V1]
    setkey(maps, "V1")
    # Delete if more than one time a mapping is found
    maps <- maps[number_map[N < 2, V1]] 
    # Now subset the matrix
    exp.mat.sub <- exp.mat[maps$V1[maps$V1 %in% rownames(exp.mat)],]
    rownames(exp.mat.sub) <- maps[match(maps$V1, rownames(exp.mat.sub))]$map
    exp.mat.sub
             s1       s2       s3       s4       s5       s6
    a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
    b  9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
    c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
    d  9.723376 7.488496 6.336103 5.931088 8.914664 9.306047
    
    set.seed(1)
    
    exp.mat
    7.564495
    已在执行
    maps[,value:=rowMeans(exp.mat)]
    时分配到最后一行。但是这个值应该是e2的值。但是,这并不重要,因为不存在
    e2
    值。此外,如果
    映射
    对象的行与
    原始矩阵的行不相同,则此方法不起作用。如果您是对的,我尝试通过基于rown列对行平均值排序来解决此问题。这对你有用吗?@tobias bekker:它适用于这个案例,我没有预见到任何问题。我将对我的原始问题执行相同的操作,并在出现问题时通知您。谢谢当
    中的少数值映射$rown
    行名称(exp.mat)
    中不存在时,它不起作用。。
    row.names(exp.mat)
    中的所有值是否应出现在
    maps$rown
    中??
    exp.mat.trans <- exp.mat[c(1,4,5,7),]
    rownames(exp.mat.trans) <- c('a','b','c','d')
    
    # Index Subsetting
    ind <- c(1,4,5,7)
    exp.mat.trans2 <- exp.mat[ind,]
    rownames(exp.mat.trans2) <- maps[ind, 'map']
    
    set.seed(1)
    exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6)
    rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2')
    colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s6')
    > exp.mat
             s1       s2       s3       s4       s5       s6
    a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
    b1 6.860619 6.029873 8.887226 9.348454 5.539718 5.116656
    b2 7.864267 5.882784 9.673526 6.701745 8.618555 7.386150
    b3 9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
    c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
    d1 9.491948 8.849207 5.627775 7.467707 8.235301 7.388098
    d2 9.723376 7.488496 6.336103 5.931088 8.914664 9.306047
    e1 8.303989 8.588093 6.930570 9.136867 7.765182 7.190486
    e2 8.145570 9.959530 5.066952 8.342334 7.648598 6.223986
    maps <- data.table(rown=c('a','b1','b2','b3','c','d1','d2','e1','e1'), 
                       map =c('a','b','b','b','c','d','d','e','f'))
    #RULE 2 calculate mean of each row
    maps[, value := rowMeans(exp.mat)[rown]]
    # aggregate such that we know which mapping should be made (RULE 2)
    maps <- maps[, rown[which.max(value)], by = map]
    # Delete if more mappings are made first find the number of mappings (RULE 3)
    number_map <- maps[,.N, by = V1]
    setkey(maps, "V1")
    # Delete if more than one time a mapping is found
    maps <- maps[number_map[N < 2, V1]] 
    # Now subset the matrix
    exp.mat.sub <- exp.mat[maps$V1[maps$V1 %in% rownames(exp.mat)],]
    rownames(exp.mat.sub) <- maps[match(maps$V1, rownames(exp.mat.sub))]$map
    exp.mat.sub
             s1       s2       s3       s4       s5       s6
    a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
    b  9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
    c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
    d  9.723376 7.488496 6.336103 5.931088 8.914664 9.306047