在R中过滤没有循环的数据_R_Loops_Filtering_Dataframe

在R中过滤没有循环的数据

r loops dataframe

在R中过滤没有循环的数据,r,loops,filtering,dataframe,R,Loops,Filtering,Dataframe,我有相当大的数据帧（数百万条记录）。由于以下规则，我需要对其进行筛选： -对于每个产品，删除x>0的第一条记录之后第五条记录之前的所有记录因此，我们只对两列感兴趣-ID和x。数据帧按ID排序。使用循环很容易做到这一点，但循环在如此大的数据框架上表现不佳如何在“矢量样式”中实现它示例：过滤前 ID x 1 0 1 0 1 5 # First record with x>0 1 0 1 3 1 4 1 0 1 9 1 0 # Delet

我有相当大的数据帧（数百万条记录）。
由于以下规则，我需要对其进行筛选：
-对于每个产品，删除x>0的第一条记录之后第五条记录之前的所有记录

因此，我们只对两列感兴趣-ID和x。数据帧按ID排序。
使用循环很容易做到这一点，但循环在如此大的数据框架上表现不佳

如何在“矢量样式”中实现它

示例：
过滤前

ID  x  
1 0  
1 0  
1 5  # First record with x>0  
1 0  
1 3  
1 4  
1 0   
1 9   
1 0  # Delete all earlier records of that product  
1 0  
1 6  
2 0  
2 1  # First record with x>0   
2 0  
2 4  
2 5  
2 8  
2 0  # Delete all earlier records of that product  
2 1  
2 3

过滤后：

对于这些拆分、应用、合并问题，我喜欢使用。如果速度成为一个问题，还有其他选择，但对于大多数事情来说，plyr很容易理解和使用。我编写了一个函数，它实现了上面描述的逻辑，然后将其提供给

ddply（）

，以便根据ID对每个数据块进行操作

fun <- function(x, column, threshold, numplus){
  whichcol <- which(x[column] > threshold)[1]
  rows <- seq(from = (whichcol + numplus), to = nrow(x))
  return(x[rows,])
}

ddply（）

，以便根据ID对每个数据块进行操作

fun <- function(x, column, threshold, numplus){
  whichcol <- which(x[column] > threshold)[1]
  rows <- seq(from = (whichcol + numplus), to = nrow(x))
  return(x[rows,])
}

谢谢它起作用了。这正是我想要的-干净的R风格的解决方案。谢谢！它起作用了。这正是我想要的-干净的R风格的解决方案。