R 如何对数据帧进行分组

R 如何对数据帧进行分组,r,R,我有以下数据: structure(list(chr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), leftPos = c(720916L, 736092L, 818159L, 4105086L, 4140849L, 4164911L, 4464314L, 4764317L, 4906564L, 5010398L, 5690705L, 5775286L, 5867036L, 7230474L, 971918

我有以下数据:

structure(list(chr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), leftPos = c(720916L, 736092L, 818159L, 
4105086L, 4140849L, 4164911L, 4464314L, 4764317L, 4906564L, 5010398L, 
5690705L, 5775286L, 5867036L, 7230474L, 9719183L, 9723741L, 10142816L, 
12733035L, 12742872L, 13996088L), Means1 = c(-4.50166373984263, 
22.7414854787421, 76.5674149543741, 49.6843654244713, -6.76800756009397, 
-5.48616887633702, 171.974765532105, -2.8004419462491, 251.238878920906, 
24.3172106993831, 92.07778585458, -9.2775123433234, 110.959507183586, 
-2.57278963951353, 30.5966726830686, -6.67824575085661, 13.9003103772433, 
99.9034503108899, 11.7786055209302, 57.4114695945089), Means2 = c(-4.50166373984263,
39.4695853682122, 128.134808692772, 140.343227472869, 12.2782402432039, 
4.51728744523118, 103.786232453211, 7.2030143753191, 77.4307425703948, 
17.496276591372, 46.8586391096806, -9.2775123433234, 90.4074077805074, 
18.1392689981969, 32.2700556572673, -6.67824575085661, 17.7360779927443, 
72.2607261926, -5.09768550127924, 55.027412413907), Means3 = c(-4.50166373984263, 
46.7166260656872, 102.330672294296, 91.1594353147049, -6.76800756009397, 
7.86681297424078, 134.793892503456, -2.8004419462491, 103.286968070986, 
15.8436793365245, 59.5942459167218, -9.2775123433234, 114.600293200803, 
21.6610650365489, 19.763233485681, -6.67824575085661, 21.3471182295293, 
101.934037199673, -5.09768550127924, 72.3408810065695), Means4 = c(0, 
49.6962372223769, 132.868329968312, 151.140132674376, 19.0462478032979, 
10.0034563215682, 106.841760397079, 10.0034563215682, 88.6977192399028, 
22.252438424421, 49.7002255511708, 0, 106.856532283882, 20.7120586377105, 
34.6415203071056, 0, 20.7120586377105, 75.2943986695856, 0, 58.1026432784387
)), .Names = c("chr", "leftPos", "Means1", "Means2", "Means3", 
"Means4"), row.names = c(NA, 20L), class = "data.frame")
我想按chr和leftPos进行分组,并获得该组中其他各列的平均值

我尝试了以下方法:

CLL <- function (col) {
  col <- as.data.frame(RawZoutliers %>%
                         group_by(chr, binnum = (leftPos) %/% 500000) %>%
                         summarise(Means = mean(Means)) %>%
                         mutate(leftPos = (binnum+1) * 120000) %>%
                         select(leftPos, Means))
}

RawZoutliersBin<-lapply(RawZoutliers, CLL)

我不知道如何做到这一点

您可以通过
chr
leftPos
获得每列的平均值,方法是:

aggregate(. ~ chr + leftPos, data=mydata, FUN=mean)
在示例数据中,由于您的
leftPos
都是不同的,因此您将获得原始data.frame

例如,要获得除
chr
(包括
leftPos
)之外的每列
chr
的平均值,请执行以下操作:

编辑

如果您想通过
leftPos
的“bin”来实现这一功能,这里有一种
cut

mydata$bin <- cut(mydata$leftPos, breaks=c(seq(min(mydata$leftPos), max(mydata$leftPos), by=500000), max(mydata$leftPos)), include.lowest=T, labels=FALSE)

也许是个傻瓜?不是重复的,因为我需要对特定的垃圾箱大小进行分组,而您突出显示的问题并不适用。我开始制定解决方案,修改你的例子。不幸的是,即使我做了
DF%>%mutate(binnum=(leftPos)%%/%500000)%%>%groupby(chr,binnum)%%>%summary(Means=Means(Means))
,我也会陷入困境,因为你没有
Means
变量。无论如何,这就是Cath所说的“创建另一个变量”的意思--使用
mutate
或类似方法。如何按每500000个leftPos指定分组eg组的大小?aggregate能做到这一点吗?@SebastianZeki您可以创建另一个用于分组的变量,您能告诉我怎么做吗?我是否在原始数据帧中创建变量?@SebastianZeki,例如(显然不是优化):
mydata$bin
 aggregate(. ~ chr, data=mydata, FUN=mean)
 # chr leftPos   Means1   Means2   Means3   Means4
#1   1 6372642 48.75335 41.89019 43.90577 47.82846
mydata$bin <- cut(mydata$leftPos, breaks=c(seq(min(mydata$leftPos), max(mydata$leftPos), by=500000), max(mydata$leftPos)), include.lowest=T, labels=FALSE)
aggregate(. ~ chr + bin, data=mydata, FUN=mean)
#   chr bin  leftPos     Means1     Means2     Means3    Means4
#1    1   1   758389  31.602412  54.367577  48.181878  60.85486
#2    1   7  4136949  12.476730  52.379585  30.752747  60.06328
#3    1   8  4464314 171.974766 103.786232 134.793893 106.84176
#4    1   9  4893760  90.918549  34.043345  38.776735  40.31787
#5    1  10  5690705  92.077786  46.858639  59.594246  49.70023
#6    1  11  5821161  50.840997  40.564948  52.661390  53.42827
#7    1  14  7230474  -2.572790  18.139269  21.661065  20.71206
#8    1  18  9719183  30.596673  32.270056  19.763233  34.64152
#9    1  19  9723741  -6.678246  -6.678246  -6.678246   0.00000
#10   2  19 10142816  13.900310  17.736078  21.347118  20.71206
#11   2  25 12737954  55.841028  33.581520  48.418176  37.64720
#12   2  27 13996088  57.411470  55.027412  72.340881  58.10264