R 单个数量(核苷酸)的分裂间隔(基因组区域)
我想转换我的数据帧R 单个数量(核苷酸)的分裂间隔(基因组区域),r,intervals,regions,R,Intervals,Regions,我想转换我的数据帧df基于逐点(逐数字或核苷酸核苷酸)信息的区域 我的输入df: start end state freq 100 103 1nT 22 100 103 3nT 34 104 106 1nT 12 104 106 3nT 16 我的预期产出: position state freq 100 1nT 22 101 1nT 22 102 1nT 22 103
df
基于逐点(逐数字或核苷酸核苷酸)信息的区域
我的输入df
:
start end state freq
100 103 1nT 22
100 103 3nT 34
104 106 1nT 12
104 106 3nT 16
我的预期产出:
position state freq
100 1nT 22
101 1nT 22
102 1nT 22
103 1nT 22
100 3nT 34
101 3nT 34
102 3nT 34
103 3nT 34
104 1nT 12
105 1nT 12
106 1nT 12
104 3nT 16
105 3nT 16
106 3nT 16
有什么想法吗?非常感谢。这里是使用for循环的粗略实现
a = t(matrix(c(100, 103, "1nT" , 22,
100, 103 , "3nT" , 34,
104, 106 , "1nT" , 12,
104, 106 , "3nT" , 16), nrow = 4))
a = data.frame(a, stringsAsFactor = F)
colnames(a) = c("start", "end" , "state", "freq")
a$start = as.numeric(as.character(a$start))
a$end = as.numeric(as.character(a$end))
n = dim(a)[1]
res = NULL
for (i in 1:n) {
position = a$start[i]:a$end[i]
state = rep(a$state[i], length(position))
freq = rep(a$freq[i], length(position))
temp = cbind.data.frame(position, state, freq)
res = rbind(res, temp)
}
这里有一种方法 构建您的数据
require(data.table)
fakedata <- data.table(start=c(100,100,104,104),
end=c(103,103,106,106),
state=c("1nT","3nT","1nT","3nT"),
freq=c(22,34,12,16))
这可以通过一个简单的
apply
命令来完成
让我们按顺序构建它:
apply(data,1,row.function)
开始
和停止
之间的每个数字重复状态
和频率
。
要获得开始和停止之间的数字范围,我们可以使用冒号操作符start:stop
。
现在,在创建data.frame时,R将自动重复向量中的值,以匹配最长的向量长度。因此,我们可以从一行创建一个片段,如下所示:
data.frame(position=(row['start']:row['end']), state=row['state'], freq=row['freq'])
do.call('rbind',
apply(data, 1, function(row) {
data.frame(position=(row['start']:row['end']),
state=row['state'], freq=row['freq'])
}))
这会给你你想要的。希望这有助于教你今后如何处理类似的问题 以下是一种矢量化方法:
# load your data
df <- read.table(textConnection("start end state freq
100 103 1nT 22
100 103 3nT 34
104 106 1nT 12
104 106 3nT 16"), header=TRUE)
# extract number of needed replications
n <- df$end - df$start + 1
# calculate position and replicate state/freq
res <- data.frame(position = rep(df$start - 1, n) + sequence(n),
state = rep(df$state, n),
freq = rep(df$freq, n))
res
# position state freq
# 1 100 1nT 22
# 2 101 1nT 22
# 3 102 1nT 22
# 4 103 1nT 22
# 5 100 3nT 34
# 6 101 3nT 34
# 7 102 3nT 34
# 8 103 3nT 34
# 9 104 1nT 12
# 10 105 1nT 12
# 11 106 1nT 12
# 12 104 3nT 16
# 13 105 3nT 16
# 14 106 3nT 16
#加载数据
df不确定我是否理解这里的模式。位置102
?是我的错。现在修好了!谢谢大家!sequence
只是lappy
wrapper
do.call('rbind',
apply(data, 1, function(row) {
data.frame(position=(row['start']:row['end']),
state=row['state'], freq=row['freq'])
}))
# load your data
df <- read.table(textConnection("start end state freq
100 103 1nT 22
100 103 3nT 34
104 106 1nT 12
104 106 3nT 16"), header=TRUE)
# extract number of needed replications
n <- df$end - df$start + 1
# calculate position and replicate state/freq
res <- data.frame(position = rep(df$start - 1, n) + sequence(n),
state = rep(df$state, n),
freq = rep(df$freq, n))
res
# position state freq
# 1 100 1nT 22
# 2 101 1nT 22
# 3 102 1nT 22
# 4 103 1nT 22
# 5 100 3nT 34
# 6 101 3nT 34
# 7 102 3nT 34
# 8 103 3nT 34
# 9 104 1nT 12
# 10 105 1nT 12
# 11 106 1nT 12
# 12 104 3nT 16
# 13 105 3nT 16
# 14 106 3nT 16