Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何在数据帧的子集上迭代应用函数_R_Aggregate_Tapply - Fatal编程技术网

R 如何在数据帧的子集上迭代应用函数

R 如何在数据帧的子集上迭代应用函数,r,aggregate,tapply,R,Aggregate,Tapply,我试图从以下数据帧中一次删除一组数据帧中的异常值: set.seed(1234) library('mvoutlier') x <- rnorm(10) # standard normal x[1] <- x[1] * 10 # introduce outlier y <- rnorm(10) # standard normal y[4] <- y[4] * 10 # introduce outlier w <- rnorm(10)

我试图从以下数据帧中一次删除一组数据帧中的异常值:

set.seed(1234)
library('mvoutlier')
x <- rnorm(10)     # standard normal
x[1] <- x[1] * 10  # introduce outlier

y <- rnorm(10)     # standard normal
y[4] <- y[4] * 10  # introduce outlier

w <- rnorm(10)     # standard normal
w[9] <- w[9] * 10  # introduce outlier

grp = c(rep('a',3), rep('b',4), rep('c',3)) #Introduce groups
df = data.frame(grp, x,y,w)
removeOutliers = function(data)
  {
  print("inside")
  print(dim(data))
  z = sign2(data[, -which(colnames(data)=="grp")],makeplot=FALSE) 
  idx = which(z$wfinal01==0)  #Get the index of outliers
  return(data[-idx,]) #Return the remaining rows
  }
我编写了以下函数来删除数据帧中的异常值:

set.seed(1234)
library('mvoutlier')
x <- rnorm(10)     # standard normal
x[1] <- x[1] * 10  # introduce outlier

y <- rnorm(10)     # standard normal
y[4] <- y[4] * 10  # introduce outlier

w <- rnorm(10)     # standard normal
w[9] <- w[9] * 10  # introduce outlier

grp = c(rep('a',3), rep('b',4), rep('c',3)) #Introduce groups
df = data.frame(grp, x,y,w)
removeOutliers = function(data)
  {
  print("inside")
  print(dim(data))
  z = sign2(data[, -which(colnames(data)=="grp")],makeplot=FALSE) 
  idx = which(z$wfinal01==0)  #Get the index of outliers
  return(data[-idx,]) #Return the remaining rows
  }
我想分别删除每个组的异常行(即
a
b
,ans
c
)。我需要将包含组
a
的子数据帧传递给上述函数,并收集结果,然后对组
b
c
执行相同的操作

我知道这里可以使用
aggregate
函数,但不知道如何实现这一点

aggregate( . ~ grp, data=df, removeOutliers)
谢谢你的帮助。谢谢

试试:

for(i in unique(df$grp)) print(df[grp==i,])
  grp           x          y          w
1   a -12.0706575 -0.4771927  0.1340882
2   a   0.2774292 -0.9983864 -0.4906859
3   a   1.0844412 -0.7762539 -0.4405479
  grp          x          y          w
4   b -2.3456977  0.6445882  0.4595894
5   b  0.4291247  0.9594941 -0.6937202
6   b  0.5060559 -0.1102855 -1.4482049
7   b -0.5747400 -0.5110095  0.5747557
   grp          x          y          w
8    c -0.5466319 -0.9111954 -1.0236557
9    c -0.5644520 -0.8371717 -0.1513830
10   c -0.8900378  2.4158352 -0.9359486

for(i in unique(df$grp)) removeOutliers(df[grp==i,])
[1] "inside"
[1] 3 4
[1] "inside"
[1] 4 4
[1] "inside"
[1] 3 4

这里有一个快速方法
.SD
表示所有变量,但
by
变量除外(在本例中为
grp

如果您需要异常值:

   grp          x          y          w
1:   a  0.2774292 -0.9983864 -0.4906859
2:   a  1.0844412 -0.7762539 -0.4405479
3:   b  0.4291247  0.9594941 -0.6937202
4:   b  0.5060559 -0.1102855 -1.4482049
5:   b -0.5747400 -0.5110095  0.5747557
6:   c -0.5466319 -0.9111954 -1.0236557
7:   c -0.5644520 -0.8371717 -0.1513830
df[tokeep]

   grp           x          y          w
1:   a -12.0706575 -0.4771927  0.1340882
2:   b  -2.3456977  0.6445882  0.4595894
3:   c  -0.8900378  2.4158352 -0.9359486

split()
函数或
by()
函数可能比
aggregate()更好。我建议你看看,谢谢,迈克。卓越的