R 如果分类变量';s频率低于定义的值
以下是数据集(d)的示例: 为了检查SNP基因型的频率(0,1,2),我们可以使用table命令R 如果分类变量';s频率低于定义的值,r,R,以下是数据集(d)的示例: 为了检查SNP基因型的频率(0,1,2),我们可以使用table命令 table (d$rs3) 输出将是 0 1 2 5 2 1 这里我们想重新编码变量,如果基因型2的频率,我们可以尝试 d[] <- lapply(d, function(x) if(sum(x==2, na.rm=TRUE) < 3) replace(x, x==2, 1) else x) d # rs3 rs4 rs5 rs6 #1 1 0 0
table (d$rs3)
输出将是
0 1 2
5 2 1
这里我们想重新编码变量,如果基因型2的频率,我们可以尝试
d[] <- lapply(d, function(x)
if(sum(x==2, na.rm=TRUE) < 3) replace(x, x==2, 1) else x)
d
# rs3 rs4 rs5 rs6
#1 1 0 0 0
#2 1 0 1 0
#3 0 0 0 0
#4 1 0 1 0
#5 0 0 0 0
#6 0 2 0 1
#7 0 2 NA 1
#8 0 2 1 1
#9 NA 1 1 1
这是另一个可能的(矢量化)解决方案
indx
rs3 rs4 rs5 rs6
1 0 0 0
1 0 1 0
0 0 0 0
1 0 1 0
0 0 0 0
0 2 0 1
0 2 NA 1
0 2 1 1
NA 1 1 1
d[] <- lapply(d, function(x)
if(sum(x==2, na.rm=TRUE) < 3) replace(x, x==2, 1) else x)
d
# rs3 rs4 rs5 rs6
#1 1 0 0 0
#2 1 0 1 0
#3 0 0 0 0
#4 1 0 1 0
#5 0 0 0 0
#6 0 2 0 1
#7 0 2 NA 1
#8 0 2 1 1
#9 NA 1 1 1
library(dplyr)
d %>%
mutate_each(funs(if(sum(.==2, na.rm=TRUE) <3)
replace(., .==2, 1) else .))
indx <- colSums(d == 2, na.rm = TRUE) < 3 # Select columns by condition
d[indx][d[indx] == 2] <- 1 # Inset 1 when the subset by condition equals 2
d
# rs3 rs4 rs5 rs6
# 1 1 0 0 0
# 2 1 0 1 0
# 3 0 0 0 0
# 4 1 0 1 0
# 5 0 0 0 0
# 6 0 2 0 1
# 7 0 2 NA 1
# 8 0 2 1 1
# 9 NA 1 1 1