R 创建循环以子采样n-1行_R_Loops

R 创建循环以子采样n-1行

r loops

R 创建循环以子采样n-1行,r,loops,R,Loops,我试图使用循环函数来减少数据集的长度。我试图从我的数据帧中的四个子组中的每一个子组中进行平均采样（长度都相同）。我很难找到能够从每个子组中采样n-1行的代码，其中n表示子组的当前长度。我目前的代码如下： sub.df<- function(x){ library(data.table) library(tidyverse) setDT(x) while(nrow(x) > 24) { x.1 <- x %>% # this is the begi

我试图使用循环函数来减少数据集的长度。我试图从我的数据帧中的四个子组中的每一个子组中进行平均采样（长度都相同）。我很难找到能够从每个子组中采样n-1行的代码，其中n表示子组的当前长度。我目前的代码如下：

sub.df<- function(x){
  library(data.table)
  library(tidyverse)
  setDT(x)
  while(nrow(x) > 24) { 
    x.1 <- x %>% # this is the beginning of the sample part
      group_by(x$spiral) %>% 
      tally() %>% select(-n) %>%
      sample_n(x, nrow(x)-1, replace = FALSE) #this is where I have trouble
    ks <- ks.test(dist(x[,c(1,2)]), unif.null) #this part is for evaluating the exclusions
    ks.1 <- ks.test(dist(x.1[,c(1,2)]), unif.null)
    if(ks.1$statistic > ks$statistic) {x <- x.1} else {x <- x}
  }

}

现在，如果循环运行正确，第一个实例将从每个子组中采样3（4-1），然后采样2（3-1），然后采样1（2-1）。因此，我的最终数据如下：

x.cord   y.cord   subgroup
3        5        1
1        -3       2
-5       -5       3
-4       3        4

根据我提供的代码，我的实际数据集将有24个点，每个分组6个点，但这应该能够说明我正在尝试做什么。

我认为您没有正确使用

示例。功能group\u size
可以帮助您找到组的大小。假设所有组的大小相同，您可以在函数中替换select语句，如下所示
让我们来。首先，演示此子采样将如何工作。OP可在验证后将其用作功能的一部分
使用min（group\u size（group\u by（，subgroup）））-1
将确保对1
小于行数最少的组进行采样
library(tidyverse)
x %>% # this is the beginning of the sample part
  group_by(subgroup) %>%  # This will ensure that equal selection from each group
  sample_n(.,min(group_size(group_by(.,subgroup)))-1, replace = FALSE)

#Result - 3 from each subgroup has been selected. 

# # A tibble: 12 x 3
# # Groups: subgroup [4]
# x.cord y.cord subgroup
# <int>  <int>    <int>
# 1      1      1        1
# 2      3      5        1
# 3      2      1        1
# 4      2     -3        2
# 5      3     -1        2
# 6      1     -3        2
# 7     -4     -1        3
# 8     -2     -1        3
# 9     -5     -5        3
# 10     -4      3        4
# 11     -2      5        4
# 12     -3      4        4

库（tidyverse）
x%>%#这是示例零件的开头
分组依据（子组）%>%#这将确保从每个组中进行相同的选择
样本号（，最小值（分组大小（分组依据（，子组）））-1，替换=假）
#结果-已从每个子组中选择3个。
##A tibble:12 x 3
##分组：分组[4]
#x.cord y.cord子组
#       
# 1      1      1        1
# 2      3      5        1
# 3      2      1        1
# 4      2     -3        2
# 5      3     -1        2
# 6      1     -3        2
# 7     -4     -1        3
# 8     -2     -1        3
# 9     -5     -5        3
# 10     -4      3        4
# 11     -2      5        4
# 12     -3      4        4

现在，由于上面已经完成了验证，让我们修改函数
注意：功能未测试。请求OP使用真实数据进行测试
# modified function should be as
sub.df<- function(x){
  library(tidyverse)
  while(nrow(x) > 24) { 
    x.1 <- x %>% # this is the beginning of the sample part
      group_by(spiral) %>% 
      sample_n(.,min(group_size(group_by(.,spiral)))-1, replace = FALSE)
    ks <- ks.test(dist(x[,c(1,2)]), unif.null) #this part is for evaluating the exclusions
    ks.1 <- ks.test(dist(x.1[,c(1,2)]), unif.null)
    if(ks.1$statistic > ks$statistic) {x <- x.1} else {x <- x}
  }
  x
}

#修改后的函数应为
第24分段{
x、 1%#这是样本零件的开始
分组单位（螺旋形）%>%
样本编号（，最小值（分组大小（，螺旋形））-1，替换=假）
ks我认为您没有正确使用sample\n
。函数group\u size
可以帮助您找到组的大小。假设所有组的大小相同，您可以在函数中替换select语句，如下所示
首先，让我们来演示这个子采样是如何工作的。OP可以在验证后将其用作函数的一部分
使用min（group\u size（group\u by（，subgroup）））-1
将确保对1
小于行数最少的组进行采样
library(tidyverse)
x %>% # this is the beginning of the sample part
  group_by(subgroup) %>%  # This will ensure that equal selection from each group
  sample_n(.,min(group_size(group_by(.,subgroup)))-1, replace = FALSE)

#Result - 3 from each subgroup has been selected. 

# # A tibble: 12 x 3
# # Groups: subgroup [4]
# x.cord y.cord subgroup
# <int>  <int>    <int>
# 1      1      1        1
# 2      3      5        1
# 3      2      1        1
# 4      2     -3        2
# 5      3     -1        2
# 6      1     -3        2
# 7     -4     -1        3
# 8     -2     -1        3
# 9     -5     -5        3
# 10     -4      3        4
# 11     -2      5        4
# 12     -3      4        4

库（tidyverse）
x%>%#这是示例零件的开头
分组依据（子组）%>%#这将确保从每个组中进行相同的选择
样本号（，最小值（分组大小（分组依据（，子组）））-1，替换=假）
#结果-已从每个子组中选择3个。
##A tibble:12 x 3
##分组：分组[4]
#x.cord y.cord子组
#       
# 1      1      1        1
# 2      3      5        1
# 3      2      1        1
# 4      2     -3        2
# 5      3     -1        2
# 6      1     -3        2
# 7     -4     -1        3
# 8     -2     -1        3
# 9     -5     -5        3
# 10     -4      3        4
# 11     -2      5        4
# 12     -3      4        4

现在，由于上面已经完成了验证，让我们修改函数
注意：功能未测试。请求OP使用真实数据进行测试
# modified function should be as
sub.df<- function(x){
  library(tidyverse)
  while(nrow(x) > 24) { 
    x.1 <- x %>% # this is the beginning of the sample part
      group_by(spiral) %>% 
      sample_n(.,min(group_size(group_by(.,spiral)))-1, replace = FALSE)
    ks <- ks.test(dist(x[,c(1,2)]), unif.null) #this part is for evaluating the exclusions
    ks.1 <- ks.test(dist(x.1[,c(1,2)]), unif.null)
    if(ks.1$statistic > ks$statistic) {x <- x.1} else {x <- x}
  }
  x
}

#修改后的函数应为
第24分段{
x、 1%#这是样本零件的开始
分组单位（螺旋形）%>%
样本编号（，最小值（分组大小（，螺旋形））-1，替换=假）
ks在较高级别上，我知道我想使用groupby（）
和filter（）

因此，挑战在于编写和测试谓词-1（）

predicate_n_minus_1 <- function(x)
    seq_along(x) %in% sample(length(x) - 1)

我知道这不是一个纯粹的tidyverse解决方案，但它看起来比MKR答案中的嵌套函数调用更干净、更容易测试和修改。也许有一个tidyverse解决方案可以将整个数据操作与过滤器规范相分离？
在较高级别上，我知道我想使用group_by（）
和过滤器（）

因此，挑战在于编写和测试谓词-1（）

predicate_n_minus_1 <- function(x)
    seq_along(x) %in% sample(length(x) - 1)

我知道这不是一个纯粹的tidyverse解决方案，但它看起来比MKR答案中的嵌套函数调用更干净、更容易测试和修改。也许有一个tidyverse解决方案可以将整个数据操作与过滤器规范相分离？我建议您提供。我建议您提供。
library(testthat)
expect_equal(predicate_n_minus_1(integer()), logical())        # length 0
expect_equal(predicate_n_minus_1(integer(1)), FALSE)           # length 1
expect_equal(length(predicate_n_minus_1(integer(5))), 5)       # length isomorphism
expect_equal(sum(predicate_n_minus_1(integer(5))), 4)          # n - 1
expect_equal(sum(predicate_n_minus_1(letters)), length(letters) - 1) # other types!