R 检查值是否在一个或多个范围内,并打印包含该值的所有范围\u ID
我有一个包含基因组中所有基因的数据框(基因组文件$基因),对于每个基因,我都有染色体(基因组文件$Chrom),基因是在染色体上找到的,起始(基因组文件$Start)和结束(基因组文件$End)坐标以碱基对表示R 检查值是否在一个或多个范围内,并打印包含该值的所有范围\u ID,r,R,我有一个包含基因组中所有基因的数据框(基因组文件$基因),对于每个基因,我都有染色体(基因组文件$Chrom),基因是在染色体上找到的,起始(基因组文件$Start)和结束(基因组文件$End)坐标以碱基对表示 Genome_file <- structure(list(Gene = c("Gene_1", "Gene_2", "Gene_3", "Gene_4", "Gene_5", "Gene_6", "Gene_7", "Gene_8", "Gene_9", "Gene_10"),
Genome_file <- structure(list(Gene = c("Gene_1", "Gene_2", "Gene_3",
"Gene_4", "Gene_5", "Gene_6", "Gene_7", "Gene_8", "Gene_9", "Gene_10"),
Chrom = c("Chr_01", "Chr_01", "Chr_01", "Chr_01", "Chr_01", "Chr_04",
"Chr_04", "Chr_04", "Chr_04", "Chr_04"), Start = c(1000L, 2000L, 3000L,
4000L, 5000L, 1000L, 2000L, 3000L, 4000L, 5000L), End = c(1900L, 2900L,
3900L, 4900L, 5900L, 1900L, 2900L, 3900L, 4900L, 5900L)), .Names =
c("Gene", "Chrom", "Start", "End"), row.names = c(NA, -10L), class =
"data.frame")
>Genome_file
Gene Chrom Start End
Gene_1 Chr_01 1000 1900
Gene_2 Chr_01 2000 2900
Gene_3 Chr_01 3000 3900
Gene_4 Chr_01 4000 4900
Gene_5 Chr_01 5000 5900
Gene_6 Chr_04 1000 1900
Gene_7 Chr_04 2000 2900
Gene_8 Chr_04 3000 3900
Gene_9 Chr_04 4000 4900
Gene_10 Chr_04 5000 5900
到目前为止,我已经能够应用以下功能来打印第一个被发现包含特定基因的QTL。然而,我不知道如何打印出由分号(;)分隔的包含特定基因的所有QTL
我尝试过的功能:
library(magrittr)
# get a list of vectors where each item is whether a gene's start in
Genome_file is comprised in a QTL in QTLs dataframe
Positionok <- lapply(Genome_file$Gene_Start_bp, function(z) z >=
QTLs$bp_Start_1LOD & z <= QTLs$bp_End_1LOD)
# another list of logical vectors indicating whether gene's chromosome
matches the QTL chromosome
Chromok <- lapply(Genome_file$Chrom, function(z) z== QTLs$Chrom)
# combine the two list and use the combined vectors as an index on
Genome_file$QTL
Genome_file$QTL <- lapply(1:nrow(Genome_file), function(i) {
QTLs$QTL_ID[ Positionok[[i]] & Chromok[[i]] ]
}) %>%
# replace zero length strings with NA values
sapply(function(QTLs) ifelse(length(QTLs) == 0, NA, QTLs))
库(magrittr)
#获取一个载体列表,其中每个项目都是一个基因是否从
基因组_文件包含在QTLs数据框中的一个QTL中
位置OK=
QTLs$bp_Start_1LOD&z您可能想看看,它应该使您更容易操纵基因组间隔。然后是应该适合您的findOverlaps
函数。或者,您可以检查data.table::foverlaps
函数。您可能想查看一下,它会使您更容易操纵基因组间隔。然后是应该适合您的findOverlaps
函数。或者,您可以检查data.table::foverlaps
函数
#Gene Chrom Start End QTL
#Gene_1 Chr_01 1000 1900 QTL_1;QTL_2
#Gene_2 Chr_01 2000 2900 QTL_2
#Gene_3 Chr_01 3000 3900 QTL_2
#Gene_4 Chr_01 4000 4900 QTL_3
#Gene_5 Chr_01 5000 5900
#Gene_6 Chr_04 1000 1900
#Gene_7 Chr_04 2000 2900
#Gene_8 Chr_04 3000 3900 QTL_5
#Gene_9 Chr_04 4000 4900 QTL_5
#Gene_10 Chr_04 5000 5900
library(magrittr)
# get a list of vectors where each item is whether a gene's start in
Genome_file is comprised in a QTL in QTLs dataframe
Positionok <- lapply(Genome_file$Gene_Start_bp, function(z) z >=
QTLs$bp_Start_1LOD & z <= QTLs$bp_End_1LOD)
# another list of logical vectors indicating whether gene's chromosome
matches the QTL chromosome
Chromok <- lapply(Genome_file$Chrom, function(z) z== QTLs$Chrom)
# combine the two list and use the combined vectors as an index on
Genome_file$QTL
Genome_file$QTL <- lapply(1:nrow(Genome_file), function(i) {
QTLs$QTL_ID[ Positionok[[i]] & Chromok[[i]] ]
}) %>%
# replace zero length strings with NA values
sapply(function(QTLs) ifelse(length(QTLs) == 0, NA, QTLs))