Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/ruby-on-rails-3/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中使用seqinr包计算DNA序列的碱基_R - Fatal编程技术网

在R中使用seqinr包计算DNA序列的碱基

在R中使用seqinr包计算DNA序列的碱基,r,R,我有一个从fasta文件中提取的数组 > dat [1] "t" "a" "t" "t" "t" "a" "c" "c" "g" "a" "c" "g" "a" "a" "a" "

我有一个从fasta文件中提取的数组

> dat
  [1] "t" "a" "t" "t" "t" "a" "c" "c" "g" "a" "c" "g" "a" "a" "a" "t" "t" "a" "a" "t" "a" "c" "c" "a" "t" "c" "a" "g" "g" "g" "t" "a" "t"
  [34] "t" "a" "a" "g" "a" "t" "g" "c" "t" "a" "c" "c" "a" "a" "c" "g" "t" "g" "g" "t" "a" "t" "t" "a" "a" "a" "a" "t" "g" "t" "g" "c" "c"
  [67] "c" "a" "a" "c" "c" "g" "c" "g" "a" "a" "a" "a" "a" "g" "a" "a" "a" "g" "t" "g" "g" "t" "a" "t" "a" "t" "a" "g" "g" "a" "a" "a" "a"
序列要长得多,但这并不重要,我希望将此数组中的前100000个字符分成长度为1000的间隔,并计算每个间隔中的“g”碱基数。到目前为止,我已经尝试:

library(seqinr)
intervals = 1000*(0:99)
g_count = count(dat[intervals+1:intervals+1000], 1)[["g"]]
但这会返回错误:
数值表达式有100个元素:只有第一个使用的

感谢您提供的任何帮助

计算每个间隔内的“g”数,您可以使用此基本R方法:

n <- 1000
result <- tapply(dat, ceiling(seq_along(dat)/n), function(x) sum(x == 'g'))

n我们可以在
base R

rowsum(+(dat == 'g'), as.integer(gl(length(dat), n, length(dat))))
数据
dat
rowsum(+(dat == 'g'), as.integer(gl(length(dat), n, length(dat))))
dat <- c("t", "a", "t", "t", "t", "a", "c", "c", "g", "a", "c", "g", 
"a", "a", "a", "t", "t", "a", "a", "t", "a", "c", "c", "a", "t", 
"c", "a", "g", "g", "g", "t", "a", "t")

n <- 11