用R中的for循环建立洪水度量计算的数据框架
我有一个名为all.cols2的数据集,在3年多的时间里,94个地点每20分钟测量一次水深。以下是预览:用R中的for循环建立洪水度量计算的数据框架,r,loops,dplyr,subset,data-manipulation,R,Loops,Dplyr,Subset,Data Manipulation,我有一个名为all.cols2的数据集,在3年多的时间里,94个地点每20分钟测量一次水深。以下是预览: # A tibble: 89,714 x 95 date_time Levee.slope Levee.slope.1 Levee.slope.2 Levee.slope.3 <dttm> <dbl> <dbl>
# A tibble: 89,714 x 95
date_time Levee.slope Levee.slope.1 Levee.slope.2 Levee.slope.3
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-12-01 15:05:33 -0.821 -0.539 -0.325 -0.0991
2 2015-12-01 15:25:33 -0.830 -0.548 -0.334 -0.108
3 2015-12-01 15:45:33 -0.829 -0.547 -0.333 -0.107
4 2015-12-01 16:05:33 -0.833 -0.551 -0.337 -0.111
5 2015-12-01 16:25:33 -0.829 -0.547 -0.333 -0.107
6 2015-12-01 16:45:33 -0.834 -0.552 -0.338 -0.112
7 2015-12-01 17:05:33 -0.839 -0.557 -0.343 -0.117
8 2015-12-01 17:25:33 -0.835 -0.553 -0.339 -0.113
9 2015-12-01 17:45:33 -0.826 -0.544 -0.330 -0.104
10 2015-12-01 18:05:33 -0.804 -0.522 -0.308 -0.0821
# ... with 89,704 more rows, and 90 more variables: Levee.slope.4 <dbl>,
…每个位置的每个洪水事件具有平均水深、最大水深、观测值、洪水事件持续时间(以天为单位)以及开始和结束的日期/时间
现在我必须在运行for循环之前指定I
,它不会自动通过我的站点
我的问题是,是否有一种方法可以让for循环一次遍历所有位置,并将其存储在与上表类似的组合输出中?还有,有没有一种方法可以压缩我循环中的代码,这样我就不必创建这么多数据帧?没有一些数据很难展示,但这里有一个使用
foreach
的psuedo代码,如果你想加快速度,可以使用doParallel
data <- bind_rows(foreach(location = list_locations) %do% {
# code handling data for one location
# ...
# process for each column of one location
one_location_df <- bind_rows(foreach(i_col=(1:length(data))) %do% {
# your code handling data
# the final return should be a data_frame even if it is one row data frame
return(one_result_df)
})
# some additiona code if has
# ...
return(one_location_df)
})
这里的数据是一种加速:如果不是2if\u else
,只需一个all.cols2\u sub$VarA 0)
。速度快得多。但我建议您首先分析代码,请参阅help('Rprof')
。您可以尝试将上述所有内容包装到一个函数中,然后将其“并行化”?我不是专家/不确定这什么时候最有效,但我在过去取得了成功。
Group.1 avg_water_depth max_depth observations duration_days begin end site
1 0.025245673 0.033995673 4 0.04166667 2016-02-09 2016-02-09 WaterLevel_Levee.slope.1_1
3 0.045995673 0.071995673 8 0.09722222 2016-05-06 2016-05-06 WaterLevel_Levee.slope.1_3
5 0.003995673 0.005995673 2 0.01388889 2016-05-06 2016-05-06 WaterLevel_Levee.slope.1_5
7 0.039370673 0.061995673 8 0.09722222 2016-05-07 2016-05-07 WaterLevel_Levee.slope.1_7
9 0.038785147 0.069995673 19 0.25000000 2016-05-27 2016-05-27 WaterLevel_Levee.slope.1_9
11 0.063817102 0.110995673 28 0.37500000 2016-05-27 2016-05-28 WaterLevel_Levee.slope.1_11
13 0.062817102 0.112995673 28 0.37500000 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_13
15 0.042495673 0.067995673 18 0.23611111 2016-05-28 2016-05-28 WaterLevel_Levee.slope.1_15
data <- bind_rows(foreach(location = list_locations) %do% {
# code handling data for one location
# ...
# process for each column of one location
one_location_df <- bind_rows(foreach(i_col=(1:length(data))) %do% {
# your code handling data
# the final return should be a data_frame even if it is one row data frame
return(one_result_df)
})
# some additiona code if has
# ...
return(one_location_df)
})