定义窗口以计算R中的平均值

定义窗口以计算R中的平均值,r,R,我对R有一个问题,试图为非常大的表编写代码 我想计算500个位置的每个窗口(列$pos)的$variable_1和$variable_2的平均值,步骤为500个位置。让我告诉你,这可能更容易理解 输入表: data_FST = data.frame(scaffold=c(rep("Scaffold_1",1000), rep("Scaffold_2",2000),

我对R有一个问题,试图为非常大的表编写代码

我想计算500个位置的每个窗口(列$pos)的$variable_1和$variable_2的平均值,步骤为500个位置。让我告诉你,这可能更容易理解

输入表:

data_FST = data.frame(scaffold=c(rep("Scaffold_1",1000),
                                 rep("Scaffold_2",2000),
                                 rep("Scaffold_3",450)),
                      variable_1=sample(1:5000,3450, replace=TRUE),
                      variable_2=sample(1:5000,3450, replace=TRUE),
                      pos=c(seq(1,2000,2),1:2000,1:450))

scaffold   pos  variable_1               variable_2
Scaffold_1 500  mean_variable(1:500)     mean_variable(1:500)
Scaffold_1 1000 mean_variable(501:1000)  mean_variable(501:1000)
Scaffold_2 500  mean_variable(1:500)     mean_variable(1:500)
Scaffold_2 1000 mean_variable(500:1000)  mean_variable(500:1000)
Scaffold_2 1500 mean_variable(1000:1500) mean_variable(1000:1500)
Scaffold_2 2000 mean_variable(1500:2000) mean_variable(1500:2000)
Scaffold_3 500  mean_variable(1:500)     mean_variable(1:500)
所需输出表:

data_FST = data.frame(scaffold=c(rep("Scaffold_1",1000),
                                 rep("Scaffold_2",2000),
                                 rep("Scaffold_3",450)),
                      variable_1=sample(1:5000,3450, replace=TRUE),
                      variable_2=sample(1:5000,3450, replace=TRUE),
                      pos=c(seq(1,2000,2),1:2000,1:450))

scaffold   pos  variable_1               variable_2
Scaffold_1 500  mean_variable(1:500)     mean_variable(1:500)
Scaffold_1 1000 mean_variable(501:1000)  mean_variable(501:1000)
Scaffold_2 500  mean_variable(1:500)     mean_variable(1:500)
Scaffold_2 1000 mean_variable(500:1000)  mean_variable(500:1000)
Scaffold_2 1500 mean_variable(1000:1500) mean_variable(1000:1500)
Scaffold_2 2000 mean_variable(1500:2000) mean_variable(1500:2000)
Scaffold_3 500  mean_variable(1:500)     mean_variable(1:500)

非常感谢您提前

您可以通过将
pos
值除以每500个值来创建一个新组,并取
变量
平均值

library(dplyr)

data_FST %>%
  group_by(scaffold, pos = ceiling(pos/500) * 500) %>%
  summarise(variable_1 = mean(variable))

#  scaffold     pos variable_1
#  <chr>      <dbl>      <dbl>
#1 Scaffold_1   500       126.
#2 Scaffold_1  1000       376.
#3 Scaffold_1  1500       626.
#4 Scaffold_1  2000       876.
#5 Scaffold_2   500      1250.
#6 Scaffold_2  1000      1750.
#7 Scaffold_2  1500      2250.
#8 Scaffold_2  2000      2750.
#9 Scaffold_3   500      3226.
库(dplyr)
数据\u FST%>%
分组人(脚手架,位置=天花板(位置/500)*500)%>%
总结(变量_1=平均值(变量))
#脚手架位置变量_1
#              
#1脚手架1 500 126。
#2脚手架1 1000 376。
#3脚手架1 1500 626。
#4.1 2000 876。
#5脚手架2 500 1250。
#6脚手架2 1000 1750。
#7脚手架2 1500 2250。
#8.2000年2月2750日。
#9脚手架3 500 3226。

很不清楚您所说的“对于多个列…$variable的平均值”是什么意思。在句子的后半部分,您告诉我们您想要计算同一变量的(窗口)平均值。