Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使嵌套purrr映射基于动态变量而不是嵌套循环提取行?_R_Loops_Dictionary_For Loop_Purrr - Fatal编程技术网

如何使嵌套purrr映射基于动态变量而不是嵌套循环提取行?

如何使嵌套purrr映射基于动态变量而不是嵌套循环提取行?,r,loops,dictionary,for-loop,purrr,R,Loops,Dictionary,For Loop,Purrr,我有一个数据框,如下所示: ## Please copy following text in your clipboard (do not copy this line) hid ,mid ,aprps,astart ,aend ,ax ,ay ,exph 10001,1000101,3 ,2012-01-01 00:00:00,2012-01-01 00:00:00,475465.6,1272272,41.55

我有一个数据框,如下所示:

## Please copy following text in your clipboard (do not copy this line)
hid  ,mid    ,aprps,astart             ,aend               ,ax      ,ay     ,exph
10001,1000101,3    ,2012-01-01 00:00:00,2012-01-01 00:00:00,475465.6,1272272,41.55607
10001,1000101,3    ,2012-01-01 00:00:00,2012-01-01 08:00:00,475465.6,1272272,41.55607
10001,1000101,4    ,2012-01-01 08:00:00,2012-01-01 08:15:00,475465.6,1272272,41.55607
10001,1000101,3    ,2012-01-01 08:15:00,2012-01-01 09:15:00,475465.6,1272272,41.55607
10001,1000101,4    ,2012-01-01 09:15:00,2012-01-01 09:30:00,475465.6,1272272,41.55607
10001,1000101,3    ,2012-01-01 09:30:00,2012-01-02 00:00:00,475465.6,1272272,41.55607
10001,1000102,3    ,2012-01-01 00:00:00,2012-01-01 00:00:00,475465.6,1272272,41.55607
10001,1000102,3    ,2012-01-01 00:00:00,2012-01-01 07:30:00,475465.6,1272272,41.55607
10001,1000102,4    ,2012-01-01 07:30:00,2012-01-01 07:50:00,475465.6,1272272,41.55607
10001,1000102,1    ,2012-01-01 07:50:00,2012-01-01 11:00:00,475465.6,1272272,41.55607
10001,1000102,4    ,2012-01-01 11:00:00,2012-01-01 11:20:00,475465.6,1272272,41.55607
10001,1000102,3    ,2012-01-01 11:20:00,2012-01-01 14:00:00,475465.6,1272272,41.55607
10001,1000102,4    ,2012-01-01 14:00:00,2012-01-01 14:20:00,475465.6,1272272,41.55607
10001,1000102,1    ,2012-01-01 14:20:00,2012-01-01 17:00:00,475465.6,1272272,41.55607
10001,1000102,4    ,2012-01-01 17:00:00,2012-01-01 17:20:00,475465.6,1272272,41.55607
10001,1000102,3    ,2012-01-01 17:20:00,2012-01-02 00:00:00,475465.6,1272272,41.55607
10001,1000103,3    ,2012-01-01 00:00:00,2012-01-01 00:00:00,475465.6,1272272,41.55607
10001,1000103,3    ,2012-01-01 00:00:00,2012-01-01 08:00:00,475465.6,1272272,41.55607
10001,1000103,4    ,2012-01-01 08:00:00,2012-01-01 12:00:00,475465.6,1272272,41.55607
10001,1000103,3    ,2012-01-01 12:00:00,2012-01-01 13:00:00,475465.6,1272272,41.55607
10001,1000103,4    ,2012-01-01 13:00:00,2012-01-01 19:00:00,475465.6,1272272,41.55607
10001,1000103,3    ,2012-01-01 19:00:00,2012-01-01 20:00:00,475465.6,1272272,41.55607
10001,1000103,4    ,2012-01-01 20:00:00,2012-01-01 23:00:00,475465.6,1272272,41.55607
10001,1000103,3    ,2012-01-01 23:00:00,2012-01-02 00:00:00,475465.6,1272272,41.55607
10001,1000104,3    ,2012-01-01 00:00:00,2012-01-01 00:00:00,475465.6,1272272,41.55607
10001,1000104,3    ,2012-01-01 00:00:00,2012-01-01 07:00:00,475465.6,1272272,41.55607
10001,1000104,4    ,2012-01-01 07:00:00,2012-01-01 07:30:00,473548.0,1279171,41.55607
10001,1000104,2    ,2012-01-01 07:30:00,2012-01-01 10:00:00,473548.0,1279171,41.55607
10001,1000104,4    ,2012-01-01 10:00:00,2012-01-01 10:30:00,475465.6,1272272,41.55607
10001,1000104,3    ,2012-01-01 10:30:00,2012-01-01 17:30:00,475465.6,1272272,41.55607
10001,1000104,4    ,2012-01-01 17:30:00,2012-01-01 17:45:00,484869.7,1270558,41.55607
10001,1000104,2    ,2012-01-01 17:45:00,2012-01-01 21:30:00,484869.7,1270558,41.55607
10001,1000104,4    ,2012-01-01 21:30:00,2012-01-01 21:45:00,475465.6,1272272,41.55607
10001,1000104,3    ,2012-01-01 21:45:00,2012-01-02 00:00:00,475465.6,1272272,41.55607
## Do not copy this line
您可以使用
{psych}
包复制上述文本并作为
df
导入:

install.packages("psych")
library(psych)
# Please copy above text and run following
df <- read.clipboard(header=TRUE, sep=",")
然而,由于实际数据帧
df
包含约40000条记录,且实际操作包含更复杂的计算,因此需要30多个小时。我试图找到缩短操作的方法,现在尝试应用
purrr
中的
map
函数将每个操作存储在嵌套的数据帧中,而不是每次循环操作时都替换变量

下面的脚本是我正在尝试构建的脚本,但是它无法达到预期的输出

    ## Store df by mid into list
    nest <- df %>% group_by(mid) %>% nest()
    ## Extract row number with "aprps==4"
    nest2 <- nest %>% mutate(row.aprps4=map(data,~which(.$aprps==4)))
    ## Obtain row numbers to extract by movement
    nest3 <- nest2 %>% mutate(row.aprps4_1=map(data,~data.frame(rm1=which(.$aprps==4)-1)),
                              row.aprps4_2=map(data,~data.frame(rm1=which(.$aprps==4))))
    ## How to extract two pairs of records based on row.aprps4_1 and row.aprps4_1 and store sum of exph?

Some trials:
# It works but cannot extract records using two variables (row.aprps4_1 and .._2)
nest3 %>% mutate(move=map2(data,row.aprps4_1,~filter(.x,seq_len(nrow(.x))%in%.y)))
# Using pmap to specify range of filtering by two variables but does not work
nest4 %>% pmap(data,row.move1,row.move2,~filter(..1,seq_len(nrow(..1))%in%..2))
# Using double map function instead of double loop but does not work
pmap(nest4$data,nest4$row.move1,nest4$row.move2,~filter(..1,seq_len(nrow(..1))%in%c(..2:..3)))
要点是:

  • 需要
    unest()
    通过从
    aprps==4
    提取的向量展开每个记录(不能应用
    .x%in%.y
    其中
    .y
    的长度超过两个)
  • mutate
    是应用
    map2
    所必需的(不接受
    nest3%>%map2(a,b,~f(.x,.y…)
    等代码)
非常感谢以下帖子提供此解决方案:

< P>因为你提到其他的选择也很受欢迎,考虑BaseR。几个问题源于你最初的(非咕噜)设置:

  • 原始代码的一个最大问题是在循环中使用
    rbind
    ,这会导致内存中的过度复制,如本SO线程和Patrick Burn所述。若要解决此问题,请构建附加到循环外部的数据帧列表


  • 重复使用范围界定作业,
    非常感谢您的替代想法。因为我会为自己编写快速而肮脏的代码,所以我想我需要理解R的行为来编写rational代码。你的解释对我深入了解这个问题很有帮助。谢谢伟大的很乐意帮忙。是的,在扩展到第三方软件包之前,要很好地处理R及其对象模型。
    
        ## Store df by mid into list
        nest <- df %>% group_by(mid) %>% nest()
        ## Extract row number with "aprps==4"
        nest2 <- nest %>% mutate(row.aprps4=map(data,~which(.$aprps==4)))
        ## Obtain row numbers to extract by movement
        nest3 <- nest2 %>% mutate(row.aprps4_1=map(data,~data.frame(rm1=which(.$aprps==4)-1)),
                                  row.aprps4_2=map(data,~data.frame(rm1=which(.$aprps==4))))
        ## How to extract two pairs of records based on row.aprps4_1 and row.aprps4_1 and store sum of exph?
    
    Some trials:
    # It works but cannot extract records using two variables (row.aprps4_1 and .._2)
    nest3 %>% mutate(move=map2(data,row.aprps4_1,~filter(.x,seq_len(nrow(.x))%in%.y)))
    # Using pmap to specify range of filtering by two variables but does not work
    nest4 %>% pmap(data,row.move1,row.move2,~filter(..1,seq_len(nrow(..1))%in%..2))
    # Using double map function instead of double loop but does not work
    pmap(nest4$data,nest4$row.move1,nest4$row.move2,~filter(..1,seq_len(nrow(..1))%in%c(..2:..3)))
    
    ## Convert df into nested data frame by `mid`
    nest <- df %>% group_by(mid) %>% nest()
    
    ## Obtain row numbers to extract aprps==4
    nest2 <- nest %>% mutate(r=map(data,~which(.$aprps==4)))
    
    ## Split r and expand record
    nest3 <- nest2 %>% unnest(r,.drop=FALSE)
    
    ## Extract pairs of movement
    nest4 <- nest3 %>% mutate(pair=map2(data,r,~filter(.x,seq_len(nrow(.x))%in%c((.y-1):.y)))) %>% dplyr::select(mid,pair)
    
    calc <- function(sub) {
    
        ## Extract records by "mid" excluding the first records
        temp <- sub[2:nrow(temp),]
    
        ## Extract row number of "aprps==4"
        r.aprps <- which(temp$aprps==4)
    
        ## Store exp dataframes in list
        subdf_list <- lapply(1:length(r.aprps), function(j) {
    
            ## Extract movement by two pairs of rows based on "r.aprps"
            temp2 <- temp[c((r.aprps[j]-1):r.aprps[j]),]
    
            ## Other operations in actual data set (just put example)
            exp <- data.frame(mid=unique(temp2$mid), expsum=sum(temp2$exph))
    
            return(exp)
        })
    
        df.exp <- do.call(rbind, subdf_list)  
        return(df.exp)
    }
    
    ## subset by mid and pass subsets to calc()
    df_list <- by(df, df$mid, calc)
    
    ## append all in final object
    final_df <- do.call(rbind, df_list)
    
    df.exp  <- dplyr::bind_rows(subdf_list) 
    ...
    final_df <-  dplyr::bind_rows(df_list)
    
    
    df.exp  <- data.table::rbindlist(subdf_list)
    ...
    final_df <-  data.table::rbindlist(df_list)