R 单个数量（核苷酸）的分裂间隔（基因组区域）_R_Intervals_Regions

R 单个数量（核苷酸）的分裂间隔（基因组区域）

R 单个数量（核苷酸）的分裂间隔（基因组区域）,r,intervals,regions,R,Intervals,Regions,我想转换我的数据帧df基于逐点（逐数字或核苷酸核苷酸）信息的区域我的输入df： start end state freq 100 103 1nT 22 100 103 3nT 34 104 106 1nT 12 104 106 3nT 16 我的预期产出： position state freq 100 1nT 22 101 1nT 22 102 1nT 22 103

我想转换我的数据帧

df

基于逐点（逐数字或核苷酸核苷酸）信息的区域

我的输入

df

：

start  end  state  freq
 100   103   1nT    22
 100   103   3nT    34
 104   106   1nT    12
 104   106   3nT    16

我的预期产出：

position state freq
  100     1nT   22
  101     1nT   22
  102     1nT   22
  103     1nT   22
  100     3nT   34
  101     3nT   34
  102     3nT   34
  103     3nT   34
  104     1nT   12
  105     1nT   12
  106     1nT   12
  104     3nT   16
  105     3nT   16
  106     3nT   16

有什么想法吗？非常感谢。

这里是使用for循环的粗略实现

    a = t(matrix(c(100, 103,  "1nT" ,   22,
    100,   103 ,  "3nT" ,   34,
    104,   106 ,  "1nT" ,   12,
    104,   106 ,  "3nT" ,   16), nrow = 4))
    a = data.frame(a, stringsAsFactor = F)

    colnames(a) = c("start",  "end" , "state",  "freq")
    a$start = as.numeric(as.character(a$start))
    a$end = as.numeric(as.character(a$end))

    n = dim(a)[1]
    res = NULL

    for (i in 1:n) {
      position = a$start[i]:a$end[i]
      state = rep(a$state[i], length(position))
      freq = rep(a$freq[i], length(position))
      temp = cbind.data.frame(position, state, freq)
      res = rbind(res, temp)
    }

这里有一种方法

构建您的数据

require(data.table)
fakedata <- data.table(start=c(100,100,104,104),
                       end=c(103,103,106,106),
                       state=c("1nT","3nT","1nT","3nT"),
                       freq=c(22,34,12,16))

这可以通过一个简单的

apply

命令来完成

让我们按顺序构建它：

您希望基于每一行执行一个操作，所以按行应用应该是您的第一个想法（或for循环）。所以我们知道我们想要使用

apply（data，1，row.function）

想想你想为一行做什么。您希望对

开始

和

停止

之间的每个数字重复

状态

和

频率

。要获得开始和停止之间的数字范围，我们可以使用冒号操作符

start:stop

。现在，在创建data.frame时，R将自动重复向量中的值，以匹配最长的向量长度。因此，我们可以从一行创建一个片段，如下所示：

data.frame(position=(row['start']:row['end']), state=row['state'], freq=row['freq'])

然后我们想把它绑定在一起，所以我们使用'do.call（'rbind'，result）

现在，将所有这些放在一起，我们有：

do.call('rbind',        
  apply(data, 1, function(row) {
    data.frame(position=(row['start']:row['end']),
      state=row['state'], freq=row['freq'])
  }))

这会给你你想要的。希望这有助于教你今后如何处理类似的问题

以下是一种矢量化方法：

# load your data
df <- read.table(textConnection("start  end  state  freq
 100   103   1nT    22
 100   103   3nT    34
 104   106   1nT    12
 104   106   3nT    16"), header=TRUE)

# extract number of needed replications
n <- df$end - df$start + 1

# calculate position and replicate state/freq
res <- data.frame(position = rep(df$start - 1, n) + sequence(n),
                  state = rep(df$state, n),
                  freq = rep(df$freq, n))
res
#    position state freq
# 1       100   1nT   22
# 2       101   1nT   22
# 3       102   1nT   22
# 4       103   1nT   22
# 5       100   3nT   34
# 6       101   3nT   34
# 7       102   3nT   34
# 8       103   3nT   34
# 9       104   1nT   12
# 10      105   1nT   12
# 11      106   1nT   12
# 12      104   3nT   16
# 13      105   3nT   16
# 14      106   3nT   16

#加载数据
df不确定我是否理解这里的模式。位置102
？是我的错。现在修好了！谢谢大家!sequence
只是lappywrapper
do.call('rbind',        
  apply(data, 1, function(row) {
    data.frame(position=(row['start']:row['end']),
      state=row['state'], freq=row['freq'])
  }))

# load your data
df <- read.table(textConnection("start  end  state  freq
 100   103   1nT    22
 100   103   3nT    34
 104   106   1nT    12
 104   106   3nT    16"), header=TRUE)

# extract number of needed replications
n <- df$end - df$start + 1

# calculate position and replicate state/freq
res <- data.frame(position = rep(df$start - 1, n) + sequence(n),
                  state = rep(df$state, n),
                  freq = rep(df$freq, n))
res
#    position state freq
# 1       100   1nT   22
# 2       101   1nT   22
# 3       102   1nT   22
# 4       103   1nT   22
# 5       100   3nT   34
# 6       101   3nT   34
# 7       102   3nT   34
# 8       103   3nT   34
# 9       104   1nT   12
# 10      105   1nT   12
# 11      106   1nT   12
# 12      104   3nT   16
# 13      105   3nT   16
# 14      106   3nT   16