R 基于非累积行和的数据帧子集_R_Sum_Subset_Rows

R 基于非累积行和的数据帧子集

R 基于非累积行和的数据帧子集,r,sum,subset,rows,R,Sum,Subset,Rows,我想基于非累积行和更多条件在r中子集一个数据帧例如，我有以下数据框： x<-data.frame(x1=c(1,2,3,4,5,6,7,8,9),x2=c(70,1,6,23,98,21,45,8,6)) 因为x2值之和小于60，而x1值大于2 由于解决方案是动态的，另一个可能的结果可能是： x1 x2 7 7 45 8 8 8 9 9 6 或：一旦我理解了如何实现它，我将通过添加更多的条件来限制可能的解决方案集为Ronak Shah编辑附加列x3，因此数据帧x变

我想基于非累积行和更多条件在r中子集一个数据帧

例如，我有以下数据框：

x<-data.frame(x1=c(1,2,3,4,5,6,7,8,9),x2=c(70,1,6,23,98,21,45,8,6))

因为x2值之和小于60，而x1值大于2

由于解决方案是动态的，另一个可能的结果可能是：

或：

一旦我理解了如何实现它，我将通过添加更多的条件来限制可能的解决方案集

为Ronak Shah编辑

附加列x3，因此数据帧x变为：

x<-data.frame(x1=c(1,2,3,4,5,6,7,8,9),x2=c(70,1,6,23,98,21,45,8,6),x3=c(13,2,31,45,5,6,7,18,0))

x=x3_thresh）-1]，]
}

我们可以编写一个函数来子集数据帧

subset_df_row <- function(x, x1_value, x2_thresh) {
    #Filter the dataframe based on x1_value
    df1 <- x[x$x1 > x1_value, ]
    #Shuffle rows to get random result
    df1 <- df1[sample(seq_len(nrow(df1))), ]
    #If the first value of x2 is greater than threshold shuffle again
    while(df1$x2[1] >= x2_thresh) {
      df1 <- df1[sample(seq_len(nrow(df1))), ]
    }
    #Return the subset
    df1[1 : (which.max(cumsum(df1$x2) >= x2_thresh) - 1), ]
}

如果您将自己限制在特定大小的“窗口”（

），您可以使用滚动求和并提取长度

的所有子集？好主意！谢谢假设现在数据帧x有第三列x3，我想对其应用一个非累积和条件，如x2。我应该添加第二个while循环，还是可以在同一个while循环中集成x2和x3来洗牌df1？在x3上，您的最后一行将如何随其他条件发生变化。为了简单起见，我修改了您的解决方案，如果我可以进一步改进，请更正。

  x1 x2
3  3  6

x<-data.frame(x1=c(1,2,3,4,5,6,7,8,9),x2=c(70,1,6,23,98,21,45,8,6),x3=c(13,2,31,45,5,6,7,18,0))

subset_df_row <- function(x, x1_value, x2_thresh, x3_thresh) {
  #Filter the dataframe based on x1_value
  df1 <- x[x$x1 > x1_value, ]
  #Shuffle rows to get random result
  df1 <- df1[sample(seq_len(nrow(df1))), ]
  #If the first value of x2 is greater than threshold shuffle again
  while(df1$x2[1] >= x2_thresh || df1$x3[1] >= x3_thresh) {
    df1 <- df1[sample(seq_len(nrow(df1))), ]
  }
  #Return the subset
  df1[1 : min((which.max(cumsum(df1$x2) >= x2_thresh) - 1),
              (which.max(cumsum(df1$x3) >= x3_thresh) - 1)), ]
}

subset_df_row <- function(x, x1_value, x2_thresh) {
    #Filter the dataframe based on x1_value
    df1 <- x[x$x1 > x1_value, ]
    #Shuffle rows to get random result
    df1 <- df1[sample(seq_len(nrow(df1))), ]
    #If the first value of x2 is greater than threshold shuffle again
    while(df1$x2[1] >= x2_thresh) {
      df1 <- df1[sample(seq_len(nrow(df1))), ]
    }
    #Return the subset
    df1[1 : (which.max(cumsum(df1$x2) >= x2_thresh) - 1), ]
}

subset_df_row(x, 2, 60)
#  x1 x2
#6  6 21
#8  8  8

subset_df_row(x, 3, 160)
#  x1 x2
#8  8  8
#5  5 98
#4  4 23