R 创建while循环函数，该函数在满足所有条件之前取下一个最大值_R_Function_Loops_Intervals

R 创建while循环函数，该函数在满足所有条件之前取下一个最大值

r function loops

R 创建while循环函数，该函数在满足所有条件之前取下一个最大值,r,function,loops,intervals,R,Function,Loops,Intervals,我想创建一个函数，在数据帧中创建一个新列，该列以所有行中的所有0开头，但将基于以下内容创建1。它开始查看百分比列中的最高百分比。这将在同一行中新创建的“算法”列中生成1。然后它将查看起始行的最小行和最大行。假设第6行中找到的最高值（起始值）为13,8%，接下来将查看第5行和第7行。然后它将查看此处的百分比，确定最高百分比，并在“算法”列中创建1（假设第7行中为8,3%）。接下来，它将再次查看最小和最大行（第5行和第8行，因为已经考虑了第6行和第7行）还有一个重要的因素是，它必须在寻找更多行时以

我想创建一个函数，在数据帧中创建一个新列，该列以所有行中的所有0开头，但将基于以下内容创建1。它开始查看百分比列中的最高百分比。这将在同一行中新创建的“算法”列中生成1。然后它将查看起始行的最小行和最大行。假设第6行中找到的最高值（起始值）为13,8%，接下来将查看第5行和第7行。然后它将查看此处的百分比，确定最高百分比，并在“算法”列中创建1（假设第7行中为8,3%）。接下来，它将再次查看最小和最大行（第5行和第8行，因为已经考虑了第6行和第7行）

还有一个重要的因素是，它必须在寻找更多行时以一定的百分比停止，比如说在95%时停止。这是基于“百分比”列的总百分比，其总和应为95%

这是主要的想法，但我不知道如何做到这一点

此外，它最后还必须看得比min和max行更远，因为这两行也可以都是，例如8%，因此它必须看得更远一行，并根据最大值选择该行

还没有测试，但这是我目前正在考虑的

(While(total_perc < p_min_performance)  
prev_row_value <t (minrow -1)
next_rpw_value <t (maxrow +1)

prev > next > t(prev,)  >1
minrow <- minrow-1
maxrow <- maxrow+1

Ronak的代码正在运行：

algorithm$algorithm_column <- 0

output <- do.call(rbind, lapply(split(algorithm, algorithm$pc4), 
function(x) {
     all_index <- x$idgroup
     next_comb <- all_index
     while(sum(x$percent[x$algorithm_column == 1]) <= 95) {
        inds <- next_comb[which.max(x$percent[next_comb])]
        x$algorithm_column[inds] <- 1
        nos <- which(all_index == inds)
         next_comb <- all_index[c(nos - 1, nos + 1)]
        all_index <- setdiff(all_index, inds)
     }
    x
}))

但这应该是：（时间间隔从09:00到15:00）

因此，最后的算法应该进一步查看行，然后仅查看最高值旁边的行（如果它们都为0）

我现在正忙着创建它的一大块，但我是一个，但被卡住了

runAlgorithm <- function(x, min_performance = 95) {
x$algorithm_column <- 0
x$iteration <- 0

it <- 0

all_index <- x$idgroup
next_comb <- all_index

inds <- next_comb[which.max(x$percent[next_comb])]
x$algorithm_column[inds] <- 1
 x$iteration[inds] <- it

#While loop algorithm
while (sum(x$percent[x$algorithm_column == 1]) <= min_performance) {

prev_values <- x$percent[1:inds - 1]
next_values <- x$percent[inds + 1:length(x$percent)]
first_non_zero_prev <- if_else(sum(prev_values) > 0L, which.max(prev_values 
> 0), NA)
first_non_zero_next <- if_else(sum(next_values) > 0L, which.max(next_values 
> 0), NA)
next_value <- case_when(
  is.na(first_non_zero_prev) & !is.na(first_non_zero_next) ~ next_comb[2],
  !is.na(first_non_zero_prev) & is.na(first_non_zero_next) ~ next_comb[1],
  first_non_zero_prev <= first_non_zero_next ~ next_comb[2],
  first_non_zero_prev > first_non_zero_next ~ next_comb[1]
)

inds <- next_comb[which.max(x$percent[next_value])]
x$algorithm_column[inds] <- 1
x$iteration[inds] <- it

nos <- which(all_index == inds)
next_comb <- all_index[c(nos - 1, nos + 1)]
all_index <- setdiff(all_index, inds)
}
return(x)
}
df_test <- groep_test[1:48,]
output <- runAlgorithm(df_test)

runAlgorithm这里有一种方法
#Remaining index
all_index <- algorithm$idgroup
#Initialising to 0
algorithm$algorithm_column <- 0
#Index to check for maximum
next_comb <- all_index
#While more than 20% of the rows are remaining.
#Change this to whatever number you wish. For 95% use 0.05 
while(sum(x$percent[x$algorithm_column == 1]) <= 95) {
   #Get maximum index
   inds <- next_comb[which.max(algorithm$percent[next_comb])]
   #Change the value to 1
   algorithm$algorithm_column[inds] <- 1
   nos <- which(all_index == inds)
   #Get the next two indices
   next_comb <- all_index[c(nos - 1, nos + 1)]
   #Remove the previously used index.
   all_index <- setdiff(all_index, inds)
}


对于多个组，我们可以基于pc4对数据进行分割
，并对每个组应用相同的数据
algorithm$algorithm_column <- 0

 output <- do.call(rbind, lapply(split(algorithm, algorithm$pc4), function(x) {
     all_index <- x$idgroup
     next_comb <- all_index
     while(sum(x$percent[x$algorithm_column == 1]) <= 95) {
        inds <- next_comb[which.max(x$percent[next_comb])]
        x$algorithm_column[inds] <- 1
        nos <- which(all_index == inds)
        next_comb <- all_index[c(nos - 1, nos + 1)]
        all_index <- setdiff(all_index, inds)
    }
    x
}))

algorithm$algorithm\u column这里有一个不基于循环的解决方案。基本上，它使用cumsum（）。行和（矩阵（…）
组合了示例中的第5行和第7行，然后是第4行和第8行等
根据您的评论，您可以将其添加到dplyr
链中，包括groupby（）

f_algo_return请查看图片图片图片似乎与描述不匹配。“然后它将查看此处的百分比，确定最高百分比，并在“算法”列中创建1（假设第7行中为8,3%）”。这意味着第5行将有0，但在您的图片中是0。另外，你应该在你的帖子中提供信息，而不是图片。图片是我想在最后看到的，所以在图片中的总比例接近75%（最后应该是95%左右）。它确实与我写的文本不一致，因为这就是过程。如果你的描述不匹配，人们如何帮助你？现在，如果图片是我们唯一可以遵循的，答案将是c（0,0,1,1,1,1,1,1,1,1,1,1,0,0）
。为了有一些匹配的外观，您的图片应该将第5行和第8行设置为0，因为它们都低于您的最小阈值。描述是正确的，图片只是最后的一个示例，而不是这些地方的1的外观。最后是应该是一个1的链，中间不能有零。因为我想要有一个从[随机时间]到[随机时间]的时间间隔。以图片为例，时间间隔为09:15到11:00，其中包含63,58%的观察结果（将1,11%、9,72%、6,94%和8,33%相加，这是另一列中的1），非常感谢您的回答。它看起来像是在工作。尽管如此，如果我将95%准确度的阈值设置为0.05，它仍然采用第1行，而它已经是1.3889%-100=986111%准确度（结合其余行，查看它们在“百分比列”中的百分比）。你知道为什么吗？@SaschaS阈值在这里是什么意思？我只是取行数的比率为1。也许你需要添加sum（算法$percent[算法$algorithm_column==1]），谢谢你的工作！但是，如果我有一个完全相同但有不同pc4的数据集（对于每个pc4，idgroup仍将从1计数到12）我该怎么做呢？基本上，在第13行，这一切都会重新开始，但数据不同，分组在pc4上，比如5465。@SaschaS我已经更新了答案，该答案将适用于多个组。你能检查一下吗？循环有效，但在pc4组中，我将最小值设置为95，算法会找到两行，直到两行分别为0%和0%（底部，顶部），当仍有一行具有8.5%（1低于一行具有0%）时，在这种情况下，将采用两个0%的行（而不是至少达到95），在底部取下一个8,5%，然后在顶部取所有0%的行。这使我的时间间隔从10:00到16:00，而是从00:00到16:00。也许代码不应该只查看下两行，而是如果下两行相同->它会查看下一行并选择最高的一行。提前感谢。。
p_min_performance <- 95         # SET PERCENTAGE!
#Naar 0
algorithm1$algorithm_column <- 0
algorithm1$iteration <- 0
it <- 0
algorithm1 <- do.call(rbind, lapply(split(algorithm1, algorithm1$pc4), 
function(x) {
#Index voor maximum percentage
all_index <- x$idgroup
next_comb <- all_index
#While loop algorithm
while (sum(x$percent[x$algorithm_column == 1]) <= p_min_performance) {
it <- it + 1
inds <- next_comb[which.max(x$percent[next_comb])]
x$algorithm_column[inds] <- 1
x$iteration[inds] <- it
nos <- which(all_index == inds)
next_comb <- all_index[c(nos - 1, nos + 1)]
all_index <- setdiff(all_index, inds)
}
x
}))

   pc4   tinterval  stops   percen id_g a_col iteration
1   8035    03:00:00    0   0.0000  1   1   14
2   8035    03:30:00    0   0.0000  2   1   13
3   8035    04:00:00    0   0.0000  3   1   12
4   8035    04:30:00    0   0.0000  4   1   11
5   8035    05:00:00    0   0.0000  5   1   10
6   8035    05:30:00    0   0.0000  6   1   9
7   8035    06:00:00    0   0.0000  7   1   8
8   8035    06:30:00    0   0.0000  8   1   7
9   8035    07:00:00    0   0.0000  9   1   6
10  8035    07:30:00    0   0.0000  10  1   5
11  8035    08:00:00    0   0.0000  11  1   4
12  8035    08:30:00    0   0.0000  12  1   3
13  8035    09:00:00    9   9.0909  13  1   2
14  8035    09:30:00    70  70.7071 14  1   1
15  8035    10:00:00    0   0.0000  15  1   15
16  8035    10:30:00    6   6.0606  16  1   16
17  8035    11:00:00    0   0.0000  17  1   17
18  8035    11:30:00    0   0.0000  18  1   18
19  8035    12:00:00    0   0.0000  19  1   19
20  8035    12:30:00    3   3.0303  20  1   20
21  8035    13:00:00    0   0.0000  21  1   21
22  8035    13:30:00    3   3.0303  22  1   22
23  8035    14:00:00    3   3.0303  23  1   23
24  8035    14:30:00    0   0.0000  24  1   24
25  8035    15:00:00    5   5.0505  25  1   25
26  8035    15:30:00    0   0.0000  26  0   0
27  8035    16:00:00    0   0.0000  27  0   0
28  8035    16:30:00    0   0.0000  28  0   0

   pc4   tinterval  stops   percen id_g a_col iteration
1   8035    03:00:00    0   0.0000  1   0   0
2   8035    03:30:00    0   0.0000  2   0   0
3   8035    04:00:00    0   0.0000  3   0   0
4   8035    04:30:00    0   0.0000  4   0   0
5   8035    05:00:00    0   0.0000  5   0   0
6   8035    05:30:00    0   0.0000  6   0   0
7   8035    06:00:00    0   0.0000  7   0   0
8   8035    06:30:00    0   0.0000  8   0   0
9   8035    07:00:00    0   0.0000  9   0   0
10  8035    07:30:00    0   0.0000  10  0   0
11  8035    08:00:00    0   0.0000  11  0   0
12  8035    08:30:00    0   0.0000  12  0   0
13  8035    09:00:00    9   9.0909  13  1   2
14  8035    09:30:00    70  70.7071 14  1   1
15  8035    10:00:00    0   0.0000  15  1   3
16  8035    10:30:00    6   6.0606  16  1   4
17  8035    11:00:00    0   0.0000  17  1   5
18  8035    11:30:00    0   0.0000  18  1   6
19  8035    12:00:00    0   0.0000  19  1   7
20  8035    12:30:00    3   3.0303  20  1   8
21  8035    13:00:00    0   0.0000  21  1   9
22  8035    13:30:00    3   3.0303  22  1   10
23  8035    14:00:00    3   3.0303  23  1   11
24  8035    14:30:00    0   0.0000  24  1   12
25  8035    15:00:00    5   5.0505  25  1   13
26  8035    15:30:00    0   0.0000  26  0   0
27  8035    16:00:00    0   0.0000  27  0   0
28  8035    16:30:00    0   0.0000  28  0   0

runAlgorithm <- function(x, min_performance = 95) {
x$algorithm_column <- 0
x$iteration <- 0

it <- 0

all_index <- x$idgroup
next_comb <- all_index

inds <- next_comb[which.max(x$percent[next_comb])]
x$algorithm_column[inds] <- 1
 x$iteration[inds] <- it

#While loop algorithm
while (sum(x$percent[x$algorithm_column == 1]) <= min_performance) {

prev_values <- x$percent[1:inds - 1]
next_values <- x$percent[inds + 1:length(x$percent)]
first_non_zero_prev <- if_else(sum(prev_values) > 0L, which.max(prev_values 
> 0), NA)
first_non_zero_next <- if_else(sum(next_values) > 0L, which.max(next_values 
> 0), NA)
next_value <- case_when(
  is.na(first_non_zero_prev) & !is.na(first_non_zero_next) ~ next_comb[2],
  !is.na(first_non_zero_prev) & is.na(first_non_zero_next) ~ next_comb[1],
  first_non_zero_prev <= first_non_zero_next ~ next_comb[2],
  first_non_zero_prev > first_non_zero_next ~ next_comb[1]
)

inds <- next_comb[which.max(x$percent[next_value])]
x$algorithm_column[inds] <- 1
x$iteration[inds] <- it

nos <- which(all_index == inds)
next_comb <- all_index[c(nos - 1, nos + 1)]
all_index <- setdiff(all_index, inds)
}
return(x)
}
df_test <- groep_test[1:48,]
output <- runAlgorithm(df_test) 

#Remaining index
all_index <- algorithm$idgroup
#Initialising to 0
algorithm$algorithm_column <- 0
#Index to check for maximum
next_comb <- all_index
#While more than 20% of the rows are remaining.
#Change this to whatever number you wish. For 95% use 0.05 
while(sum(x$percent[x$algorithm_column == 1]) <= 95) {
   #Get maximum index
   inds <- next_comb[which.max(algorithm$percent[next_comb])]
   #Change the value to 1
   algorithm$algorithm_column[inds] <- 1
   nos <- which(all_index == inds)
   #Get the next two indices
   next_comb <- all_index[c(nos - 1, nos + 1)]
   #Remove the previously used index.
   all_index <- setdiff(all_index, inds)
}

algorithm
#    pc4 timeinterval stops percent idgroup algorithm_column
#1  5464     08:45:00     1  1.3889       1                0
#2  5464     09:00:00     5  6.9444       2                1
#3  5464     09:15:00     8 11.1111       3                1
#4  5464     09:30:00     7  9.7222       4                1
#5  5464     09:45:00     5  6.9444       5                1
#6  5464     10:00:00    10 13.8889       6                1
#7  5464     10:15:00     6  8.3333       7                1
#8  5464     10:30:00     4  5.5556       8                1
#9  5464     10:45:00     7  9.7222       9                1
#10 5464     11:00:00     6  8.3333      10                1
#11 5464     11:15:00     5  6.9444      11                1
#12 5464     11:30:00     8 11.1111      12                0

algorithm$algorithm_column <- 0

 output <- do.call(rbind, lapply(split(algorithm, algorithm$pc4), function(x) {
     all_index <- x$idgroup
     next_comb <- all_index
     while(sum(x$percent[x$algorithm_column == 1]) <= 95) {
        inds <- next_comb[which.max(x$percent[next_comb])]
        x$algorithm_column[inds] <- 1
        nos <- which(all_index == inds)
        next_comb <- all_index[c(nos - 1, nos + 1)]
        all_index <- setdiff(all_index, inds)
    }
    x
}))

f_algo_return <- function(pct, max_threshold = 70){
  # initialize return variable
  algo <- vector(mode = 'integer', length = length(pct))

  #make rows
  max_row <- which.max(pct)

  #if we have odd number of rows, we need to prevent subsetting pct[0]
  len_out <- min(abs(max_row - c(1, length(pct))))

  all_rows <- c(max_row,
                (max_row - len_out):(max_row-1),
                (max_row+1):(max_row + len_out)
  )

  #subset the pct
  pct <- pct[all_rows]

  thresh <- cumsum(c(pct[1], rowSums(matrix(pct[-1], ncol = 2)))) < max_threshold
  sub_rows <- all_rows[c(thresh[1], rev(thresh[-1]), thresh[-1])]

  #initialize and update new variable
  algo[sub_rows] <- 1L

  return(algo)
}

f_algo_return(DF[['percent']])
# [1] 0 0 1 1 1 1 1 1 1 0 0 0

DF <- data.frame(pc4 = c(5464),
                        timeinterval = c('08:45:00', '09:00:00', '09:15:00', '09:30:00', 
                                         '09:45:00', '10:00:00', '10:15:00', '10:30:00', '10:45:00', '11:00:00', 
                                         '11:15:00', '11:30:00'),
                        stops = c(1, 5, 8, 7, 5, 10, 6, 4, 7, 6, 5, 8)) %>%
  mutate(percent = round(stops/sum(stops), digits = 6)*100) %>%
  mutate(idgroup = seq_along(timeinterval))