R 汇总矩阵。获得每100000个单位类别的平均值_R_Dataframe

R 汇总矩阵。获得每100000个单位类别的平均值

r dataframe

R 汇总矩阵。获得每100000个单位类别的平均值,r,dataframe,R,Dataframe,我有以下数据结构 pos <- c(4532568,4541529,4586529,4591235,4712360,4732504,4740231,10532655,10542365,10564587,45312567,45326354,45369874,124832658,124845829,124869874) cm <- c(2.21,2.25,2.26,2.29,3.31,3.35,3.36,4.32,4.35,4.39,5.23,5.27,5.29,7.36,7.45,7.

我有以下数据结构

pos <- c(4532568,4541529,4586529,4591235,4712360,4732504,4740231,10532655,10542365,10564587,45312567,45326354,45369874,124832658,124845829,124869874)
cm <- c(2.21,2.25,2.26,2.29,3.31,3.35,3.36,4.32,4.35,4.39,5.23,5.27,5.29,7.36,7.45,7.49)
data <- cbind(pos,cm)

            pos   cm
 [1,]   4532568 2.21
 [2,]   4541529 2.25
 [3,]   4586529 2.26
 [4,]   4591235 2.29
 [5,]   4712360 3.31
 [6,]   4732504 3.35
 [7,]   4740231 3.36
 [8,]  10532655 4.32
 [9,]  10542365 4.35
 [10,]  10564587 4.39
 [11,]  45312567 5.23
 [12,]  45326354 5.27
 [13,]  45369874 5.29
 [14,] 124832658 7.36
 [15,] 124845829 7.45
 [16,] 124869874 7.49

但是，如果我使用整个脚本来获得Ch1$CM的平均值：

 Ch1<- ch1 %>%
 as.data.frame %>% 
 group_by(Pos = plyr::round_any(Pos, 1e5, f = floor)) %>% 
 summarise(cm = mean(cm))

正如你所看到的，平均值是错误的，因为它们都是相等的。我不知道为什么会发生这种情况。

我们可以使用

round\u any

library(dplyr)
data %>%
    as.data.frame %>% 
    group_by(grp = plyr::round_any(pos, 1e5, f = floor)) %>% 
    summarise(cm = mean(cm))
# A tibble: 5 x 2
#        grp       cm
#      <dbl>    <dbl>
#1   4500000 2.252500
#2   4700000 3.340000
#3  10500000 4.353333
#4  45300000 5.263333
#5 124800000 7.433333

库（dplyr）
数据%>%
as.data.frame%>%
组员（grp=plyr：：轮切任何（位置，1e5，f=楼层））%>%
总结（cm=平均值（cm））
#一个tibble:5x2
#玻璃钢厘米
#          
#1   4500000 2.252500
#2   4700000 3.340000
#3  10500000 4.353333
#4  45300000 5.263333
#5 124800000 7.433333

对不起，阿克鲁姆！但我无法在真实的数据集中得到正确的结果。当我应用你的脚本时，我会用我得到的结果来回答我的问题。。。我希望您能进一步帮助我。@Cisco可能您有NA，然后使用

mean（cm，NA.rm=TRUE）

或者如果值真的很大，那么您可能需要使用诸如Rmpfr或gmp之类的专用软件包来获得您可以在下面看到的精度，脚本的第一部分工作得很好，问题在于各组变量CM的平均值。我没有任何NA。数据集是280000行。我相信它不会太大。总之，group_by（）工作得很好，但是summarise（）不太好。

aggregate（dat[，“cm”]、list（floor（dat[，“pos”]/1e5）），意思是）

非常感谢，用户20650。它工作得很好。但是，我还需要像“newdata”中那样获得变量pos。只需将其乘以

1e5

aggregate（dat[，“cm”]，list（pos=1e5*floor（dat[，“pos”]/1e5）），mean）

。对于更大的数据，

data.table

可能更快，即

setDT（as.data.frame（dat））[，lapply（.SD，mean），by=1e5*floor（pos/1e5）]

。这些方法与akrun的好答案相同——它只是展示了如何创建组，也许更明确一点。

 structure(list(Chr = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L), .Label = "1", class = "factor"), Pos = c(0, 0, 0, 
 2e+05, 5e+05, 5e+05, 5e+05, 5e+05, 5e+05, 7e+05), CM = c(0, 0.080572, 
 0.092229, 0.439456, 1.478148, 1.478214, 1.480558, 1.488889, 1.489481, 
 1.931794)), .Names = c("Chr", "Pos", "CM"), row.names = c(NA, 
 -10L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = "Pos", drop = TRUE, indices = list(
 0:2, 3L, 4:8, 9L), group_sizes = c(3L, 1L, 5L, 1L), biggest_group_size = 5L, labels = structure(list(
 Pos = c(0, 2e+05, 5e+05, 7e+05)), row.names = c(NA, -4L), class = "data.frame", vars = "Pos", drop = TRUE, .Names = "Pos"))

 Ch1<- ch1 %>%
 as.data.frame %>% 
 group_by(Pos = plyr::round_any(Pos, 1e5, f = floor)) %>% 
 summarise(cm = mean(cm))

 structure(list(Pos = c(0, 2e+05, 5e+05, 7e+05, 8e+05, 9e+05, 
 1e+06, 1100000, 1200000, 1300000), cm = c(4.528498, 4.528498, 
 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 4.528498, 
 4.528498)), .Names = c("Pos", "cm"), row.names = c(NA, -10L), class = c("tbl_df", 
 "tbl", "data.frame"))

library(dplyr)
data %>%
    as.data.frame %>% 
    group_by(grp = plyr::round_any(pos, 1e5, f = floor)) %>% 
    summarise(cm = mean(cm))
# A tibble: 5 x 2
#        grp       cm
#      <dbl>    <dbl>
#1   4500000 2.252500
#2   4700000 3.340000
#3  10500000 4.353333
#4  45300000 5.263333
#5 124800000 7.433333