R 对行对进行平均，并根据条件粘贴值_R_Dplyr

R 对行对进行平均，并根据条件粘贴值

R 对行对进行平均，并根据条件粘贴值,r,dplyr,R,Dplyr,在R中，我有一个df，例如： a b c 1 124 70 aa 2 129 67 aa 3 139 71 aa 4 125 77 aa 5 125 82 aa 6 121 69 aa 7 135 68 bb 8 137 72 bb 9 137 78 bb 10 140 86

在R中，我有一个df，例如：

     a      b   c 
 1   124    70  aa     
 2   129    67  aa     
 3   139    71  aa     
 4   125    77  aa     
 5   125    82  aa     
 6   121    69  aa     
 7   135    68  bb
 8   137    72  bb
 9   137    78  bb
10   140    86  bb

我想沿着列（a，b）中的行进行迭代，计算所有行对的平均值，如果这两行之间的差值大于等于12，则将此平均值粘贴到相同的两行新列（a_new，b_new）。否则，只需复制旧值。这种行为应限于由另一列（c）标记的组，即如果两行来自不同的组，则不应发生这种行为

在本例中，它发生在第3行（a列中的cos，与下一（4）行的差异为14）和第5行（b列中的cos，与下一行的差异为13）。但是，第6行不应出现这种情况，因为第7行在另一个c组中

因此，生成的df看起来像：

     a      b   c     a_new  b_new
 1   124    70  aa    124    70
 2   129    67  aa    129    67
 3   139    71  aa    132    71   
 4   125    77  aa    132    68
 5   125    82  aa    125    75.5
 6   121    69  aa    121    75.5
 7   135    68  bb    135    68
 8   137    72  bb    137    72
 9   137    78  bb    137    78
10   140    86  bb    140    86

我为此奋斗了一段时间，发现也许可以使用滞后函数，但没有成功。非常感谢您的帮助（无论是base R、dplyr还是其他）

Dput：

我的理解是，将下面代码中注释的程序应用于指标栏“c”中给出的每组：


pairAverage <- function(x) {
  # x should be a numeric vector of length > 1
  if (is.vector(x) & is.numeric(x) & length(x) > 1) {

    # copy data to an aux vector
    aux <- x

    # get differences of lag 1
    dh<-diff(x, 1)

    # get means of consecutive pairs
    med <- c(x$a[2:length(x)] - dh/2)

    # get positions (index) of abs(means) >= 12  
    idx <- match(med[abs(dh) >= 12], med)

    # need 2 reps of each mean to replace consecutive values of x
    valToRepl <- med[sort(rep(idx,2))]

    # ordered indexes pairs of consecutive elements of x to be replaced  
    idxToRepl <- sort(c(idx,idx+1))

    # replace pairs of values 
    aux[idxToRepl] <- valToRepl

    return(aux)

  } else {
    # do nothing
    warning("paramater x should be a numeric vector of length > 1")
    return(NULL)
  }
}

pairAverageByGroups <- function(x, gr) {
  if (is.vector(x) & is.numeric(x) & length(x) == length(gr)) {
    x.ls <- split(x, as.factor(gr))
    output <- unlist(lapply(x.ls, pairAverage))
    names(output) <- NULL
    output
  } else {
    # do nothing
    warning("paremater x should be a numeric vector of length > 1")
    return(NULL)
  }
}

pairAverageByGroups(dd$a, dd$c)
 [1] 124 129 132 132 125 121 135 137 137 140


pairAverage 1
如果（是向量（x）&是数值（x）&长度（x）>1）{
#将数据复制到辅助向量
aux我的理解是，将下面代码中注释的程序应用于指示符列“c”给出的每组：

pairAverage <- function(x) {
  # x should be a numeric vector of length > 1
  if (is.vector(x) & is.numeric(x) & length(x) > 1) {

    # copy data to an aux vector
    aux <- x

    # get differences of lag 1
    dh<-diff(x, 1)

    # get means of consecutive pairs
    med <- c(x$a[2:length(x)] - dh/2)

    # get positions (index) of abs(means) >= 12  
    idx <- match(med[abs(dh) >= 12], med)

    # need 2 reps of each mean to replace consecutive values of x
    valToRepl <- med[sort(rep(idx,2))]

    # ordered indexes pairs of consecutive elements of x to be replaced  
    idxToRepl <- sort(c(idx,idx+1))

    # replace pairs of values 
    aux[idxToRepl] <- valToRepl

    return(aux)

  } else {
    # do nothing
    warning("paramater x should be a numeric vector of length > 1")
    return(NULL)
  }
}

pairAverageByGroups <- function(x, gr) {
  if (is.vector(x) & is.numeric(x) & length(x) == length(gr)) {
    x.ls <- split(x, as.factor(gr))
    output <- unlist(lapply(x.ls, pairAverage))
    names(output) <- NULL
    output
  } else {
    # do nothing
    warning("paremater x should be a numeric vector of length > 1")
    return(NULL)
  }
}

pairAverageByGroups(dd$a, dd$c)
 [1] 124 129 132 132 125 121 135 137 137 140


pairAverage 1
如果（是向量（x）&是数值（x）&长度（x）>1）{
#将数据复制到辅助向量
aux我们可以编写一个函数，用于一个块
apply_fun <- function(x) {
    inds <- which(abs(diff(x)) >= 12)
    if(length(inds))
        x[sort(c(inds, inds + 1))] <-  c(sapply(inds, function(i) 
                                          rep(mean(x[c(i, i + 1)]), 2)))
    return(x)
}

apply_fun%mutate_at（变量（a，b），列表（new=apply_fun））
#a b c a_新b_新
#       
#112470AA12470
#2 129 67 aa 129 67
#313971AA13271
#412577AA13277
#512582AA12575.5
#612169AA12175.5
#713568 BB13568
#813772 bb 137 72
#9 137 78 bb 137 78
#1014086 BB14086
我们可以编写一个只适用于一个块的函数
apply_fun <- function(x) {
    inds <- which(abs(diff(x)) >= 12)
    if(length(inds))
        x[sort(c(inds, inds + 1))] <-  c(sapply(inds, function(i) 
                                          rep(mean(x[c(i, i + 1)]), 2)))
    return(x)
}

apply_fun%mutate_at（变量（a，b），列表（new=apply_fun））
#a b c a_新b_新
#       
#112470AA12470
#2 129 67 aa 129 67
#313971AA13271
#412577AA13277
#512582AA12575.5
#612169AA12175.5
#713568 BB13568
#813772 bb 137 72
#9 137 78 bb 137 78
#1014086 BB14086
我不明白为什么要替换a
中的第4行，也不知道您的确切值，但您的意思是这样的吗？df%>%group_by（c）%%>%mutate_at（vars（a，b），list（new=~replace（，其中（abs（diff（）>=12）+1，mean（））
？这似乎可以检测行，但粘贴的值看起来不像是平均值，只粘贴到相应对的第二行？我不清楚这里的逻辑。替换必须按组进行，即c
？替换哪些行，使用什么值？我想用替换差值大于等于12的两行它们的平均值。因此，例如，我应该将第3行（值：139）和第4行（值：125）的平均值粘贴到a_new列的第3行和第4行（该平均值为132）。这种行为不应该发生在属于不同组（c列）的行之间，因此a_new列的第6行和第7行不会发生任何情况（区别是>=12，但第6行在“aa”组中，第7行在“bb”组中）。我不明白为什么a
中的第4行被替换，我也没有得到您的确切值，但您的意思是这样的吗？df%>%group\u by（c）%%>%mutate\u at（vars（a，b），list（new=~replace（），which（abs（diff（）>=12）+1，mean（））
？这似乎可以检测行，但粘贴的值看起来不像是平均值，只粘贴到相应对的第二行？我不清楚这里的逻辑。替换必须按组进行，即c
？替换哪些行，使用什么值？我想用替换差值大于等于12的两行它们的平均值。因此，例如，我应该将第3行（值：139）和第4行（值：125）的平均值粘贴到a_new列的第3行和第4行（该平均值为132）。这种行为不应该发生在属于不同组（c列）的行之间，因此a_new列的第6行和第7行不会发生任何情况（区别是>=12，但第6行在“aa”组中，第7行在“bb”组中）。谢谢，但我一直收到一个错误：'error in FUN（X[[I]]，…）：未找到对象'dd'。函数中似乎使用了dd，但我的数据帧的名称实际上有所不同。我还收到奇怪的提示'Browse[1]>“，不知道这是为什么？抱歉！我在med行中将dd替换为x谢谢，但我一直收到一个错误：'error in FUN（x[[I]]，…）：未找到对象'dd'。函数中似乎使用了dd，但我的数据帧的名称实际上有所不同。我还收到奇怪的提示'Browse[1]>，不知道为什么？对不起！我在med行中将dd替换为x