R 从字符矩阵计算值
我有一个7358444行110列的矩阵。矩阵由字符向量组成,如下所示:R 从字符矩阵计算值,r,R,我有一个7358444行110列的矩阵。矩阵由字符向量组成,如下所示: FORMAT eQTL188 eQTL193 eQTL178 eQTL179 eQTL238 [1,] "GT:DS:GP" "0/1:0.79:0.221,0.767,0.011" "0/0:0.031:
FORMAT eQTL188 eQTL193 eQTL178 eQTL179 eQTL238
[1,] "GT:DS:GP" "0/1:0.79:0.221,0.767,0.011" "0/0:0.031:0.97,0.03,0" "0/0:0.033:0.967,0.033,0" "0/0:0.079:0.922,0.077,0.001" "0/0:0.344:0.664,0.329,0.007"
[2,] "GT:DS:GP" "0/0:0.047:0.953,0.047,0" "0/0:0.007:0.993,0.007,0" "0/0:0.006:0.994,0.006,0" "0/0:0.008:0.992,0.008,0" "0/1:0.525:0.477,0.52,0.002"
[3,] "GT:DS:GP" "0/0:0.047:0.953,0.047,0" "0/0:0.007:0.993,0.007,0" "0/0:0.006:0.994,0.006,0" "0/0:0.008:0.992,0.008,0" "0/1:0.527:0.476,0.521,0.003"
[4,] "GT:DS:GP" "0/0:0.048:0.952,0.048,0" "0/0:0.007:0.993,0.007,0" "0/0:0.006:0.994,0.006,0" "0/0:0.008:0.992,0.008,0" "0/1:0.518:0.485,0.512,0.003"
我需要为我的每个样本(带有模式eQTL的列)计算等位基因1的剂量。这可以使用每列中第二个:
后的GP值进行计算。我需要应用的公式是P(A1)=2*P(A1/A1)+P(A1/A2)
,其中P1是第二个:
之后的第一个元素,A2是第二个元素
我要查找的结果(数字矩阵)如下所示
eQTL188 eQTL193 eQTL178 eQTL179 eQTL238
[1,] 1.209 1.970 1.967 1.921 1.657
[2,] 1.953 1.903 1.994 1.992 1.474
[3,] 1.953 1.993 1.994 1.992 1.473
[4,] 1.952 1.993 1.994 1.99 1.482
由于矩阵非常庞大,速度可能是一个问题一种方法是首先检索第二个
:
之后的数字,然后通过逗号检索strsplit。公式可以应用于<代码> LePix<代码>,例如考虑这个
df <- matrix(c("GT:DS:GP" ,"0/1:0.79:0.221,0.767,0.011" ,"0/0:0.031:0.97,0.03,0" , "0/0:0.033:0.967,0.033,0", "0/0:0.079:0.922,0.077,0.001", "0/0:0.344:0.664,0.329,0.007",
"GT:DS:GP" ,"0/0:0.047:0.953,0.047,0" , "0/0:0.007:0.993,0.007,0", "0/0:0.006:0.994,0.006,0" ,"0/0:0.008:0.992,0.008,0" , "0/1:0.525:0.477,0.52,0.002",
"GT:DS:GP", "0/0:0.047:0.953,0.047,0" , "0/0:0.007:0.993,0.007,0", "0/0:0.006:0.994,0.006,0" ,"0/0:0.008:0.992,0.008,0" , "0/1:0.527:0.476,0.521,0.003",
"GT:DS:GP" ,"0/0:0.048:0.952,0.048,0" , "0/0:0.007:0.993,0.007,0", "0/0:0.006:0.994,0.006,0" ,"0/0:0.008:0.992,0.008,0" , "0/1:0.518:0.485,0.512,0.003"),
ncol=6, byrow=TRUE)
df <- df[, -1]
df <- gsub(".+:.+:(.*)", "\\1", df)
out <- lapply(strsplit(df, ","), function(x) {
x <- as.numeric(x)
return(2 * x[1] / x[1] + x[1] / x[2])
})
out <- do.call(rbind, out)
dim(out) <- dim(df)
[,1] [,2] [,3] [,4] [,5]
[1,] 2.288136 34.33333 31.30303 13.97403 4.018237
[2,] 22.276596 143.85714 167.66667 126.00000 2.917308
[3,] 22.276596 143.85714 167.66667 126.00000 2.913628
[4,] 21.833333 143.85714 167.66667 126.00000 2.947266
df不是A1/A1总是1.0吗?什么是P()
-函数?如果包含一个。查看矩阵的内容很有帮助,但给出生成矩阵的代码更有用。然后向我们展示你所尝试的。你可以考虑<代码> GSUB/COD>在第一个/第二个代码>之前和之后拆分数字:。这可以包含在函数中,该函数还将计算GP值,并使用sapply
应用于所有列。