用R中的for循环建立洪水度量计算的数据框架_R_Loops_Dplyr_Subset_Data Manipulation

用R中的for循环建立洪水度量计算的数据框架

r loops

用R中的for循环建立洪水度量计算的数据框架,r,loops,dplyr,subset,data-manipulation,R,Loops,Dplyr,Subset,Data Manipulation,我有一个名为all.cols2的数据集，在3年多的时间里，94个地点每20分钟测量一次水深。以下是预览： # A tibble: 89,714 x 95 date_time Levee.slope Levee.slope.1 Levee.slope.2 Levee.slope.3 <dttm> <dbl> <dbl>

我有一个名为all.cols2的数据集，在3年多的时间里，94个地点每20分钟测量一次水深。以下是预览：

 # A tibble: 89,714 x 95
   date_time           Levee.slope      Levee.slope.1      Levee.slope.2    Levee.slope.3
   <dttm>                         <dbl>            <dbl>            <dbl>            <dbl>
 1 2015-12-01 15:05:33           -0.821           -0.539           -0.325          -0.0991
 2 2015-12-01 15:25:33           -0.830           -0.548           -0.334          -0.108 
 3 2015-12-01 15:45:33           -0.829           -0.547           -0.333          -0.107 
 4 2015-12-01 16:05:33           -0.833           -0.551           -0.337          -0.111 
 5 2015-12-01 16:25:33           -0.829           -0.547           -0.333          -0.107 
 6 2015-12-01 16:45:33           -0.834           -0.552           -0.338          -0.112 
 7 2015-12-01 17:05:33           -0.839           -0.557           -0.343          -0.117 
 8 2015-12-01 17:25:33           -0.835           -0.553           -0.339          -0.113 
 9 2015-12-01 17:45:33           -0.826           -0.544           -0.330          -0.104 
10 2015-12-01 18:05:33           -0.804           -0.522           -0.308          -0.0821
# ... with 89,704 more rows, and 90 more variables: Levee.slope.4 <dbl>,

…每个位置的每个洪水事件具有平均水深、最大水深、观测值、洪水事件持续时间（以天为单位）以及开始和结束的日期/时间

现在我必须在运行for循环之前指定

，它不会自动通过我的站点

我的问题是，是否有一种方法可以让for循环一次遍历所有位置，并将其存储在与上表类似的组合输出中？还有，有没有一种方法可以压缩我循环中的代码，这样我就不必创建这么多数据帧？

没有一些数据很难展示，但这里有一个使用

foreach

的psuedo代码，如果你想加快速度，可以使用

doParallel

data <- bind_rows(foreach(location = list_locations) %do% {
  # code handling data for one location
  # ...
  
  # process for each column of one location
  one_location_df <- bind_rows(foreach(i_col=(1:length(data))) %do% {
    # your code handling data
    
    # the final return should be a data_frame even if it is one row data frame
    return(one_result_df)
  })
  
  # some additiona code if has
  # ...
  return(one_location_df)
})

这里的数据是一种加速：如果不是2if\u else
，只需一个all.cols2\u sub$VarA 0）
。速度快得多。但我建议您首先分析代码，请参阅help（'Rprof'）。您可以尝试将上述所有内容包装到一个函数中，然后将其“并行化”？我不是专家/不确定这什么时候最有效，但我在过去取得了成功。
 Group.1 avg_water_depth   max_depth observations duration_days      begin        end                        site
      1     0.025245673 0.033995673            4    0.04166667 2016-02-09 2016-02-09  WaterLevel_Levee.slope.1_1
      3     0.045995673 0.071995673            8    0.09722222 2016-05-06 2016-05-06  WaterLevel_Levee.slope.1_3
      5     0.003995673 0.005995673            2    0.01388889 2016-05-06 2016-05-06  WaterLevel_Levee.slope.1_5
      7     0.039370673 0.061995673            8    0.09722222 2016-05-07 2016-05-07  WaterLevel_Levee.slope.1_7
      9     0.038785147 0.069995673           19    0.25000000 2016-05-27 2016-05-27  WaterLevel_Levee.slope.1_9
     11     0.063817102 0.110995673           28    0.37500000 2016-05-27 2016-05-28 WaterLevel_Levee.slope.1_11
     13     0.062817102 0.112995673           28    0.37500000 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_13
     15     0.042495673 0.067995673           18    0.23611111 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_15
  

data <- bind_rows(foreach(location = list_locations) %do% {
  # code handling data for one location
  # ...
  
  # process for each column of one location
  one_location_df <- bind_rows(foreach(i_col=(1:length(data))) %do% {
    # your code handling data
    
    # the final return should be a data_frame even if it is one row data frame
    return(one_result_df)
  })
  
  # some additiona code if has
  # ...
  return(one_location_df)
})