Tidygraph：计算父级的子摘要_R_Tidygraph

Tidygraph：计算父级的子摘要

Tidygraph：计算父级的子摘要,r,tidygraph,R,Tidygraph,使用R中的tidygraph包，给定一棵树，我想计算平均值、总和、方差。。。树中每个节点的每个直接子节点的值我的直觉是使用map\u bfs\u back\u dbl或相关工具，并尝试修改帮助示例，但我被卡住了 library(tidygraph) # Collect values from children create_tree(40, children = 3, directed = TRUE) %>% mutate(value = round(runif(40)*100))

使用R中的tidygraph包，给定一棵树，我想计算平均值、总和、方差。。。树中每个节点的每个直接子节点的值

我的直觉是使用

map\u bfs\u back\u dbl

或相关工具，并尝试修改帮助示例，但我被卡住了

library(tidygraph)

# Collect values from children
create_tree(40, children = 3, directed = TRUE) %>%
  mutate(value = round(runif(40)*100)) %>%
  mutate(child_acc = map_bfs_back_dbl(node_is_root(), .f = function(node, path, ...) {
    if (nrow(path) == 0) .N()$value[node]
    else {
      sum(unlist(path$result[path$parent == node]))
    }
  }))

对于以上内容，我想要树中每个父级的所有直接、第一级子级的平均值

值
更新：：
我尝试过这种方法（计算子属性的方差）：
这真是太接近了：
# Node Data: 40 x 3 (active)
# Groups:    parent [14]
  parent value   var
*  <int> <dbl> <dbl>
1     NA  2.00    NA
2      1 13.0   1393
3      1 63.0   1393
4      1 86.0   1393
5      2 27.0    890
6      2 76.0    890
# ... with 34 more rows

#节点数据：40 x 3（活动）
#分组：家长[14]
父值变量
*    
1 NA 2.00 NA
2      1 13.0   1393
3      1 63.0   1393
4      1 86.0   1393
5      2 27.0    890
6      2 76.0    890
# ... 还有34行

我想看到的是：
# Node Data: 40 x 3 (active)
# Groups:    parent [14]
  parent value   var  child_var
*  <int> <dbl> <dbl>      <dbl>
1     NA  2.00    NA       1393
2      1 13.0   1393        890 
3      1 63.0   1393       (etc)
4      1 86.0   1393
5      2 27.0    890
6      2 76.0    890
# ... with 34 more rows

#节点数据：40 x 3（活动）
#分组：家长[14]
父值变量子值
*          
1 NA 2.00 NA 1393
2      1 13.0   1393        890 
3163.01393（等）
4      1 86.0   1393
5      2 27.0    890
6      2 76.0    890
# ... 还有34行

将（第一个）“var”值上移到由“父”值标识的节点。帮忙？建议
编辑：
这就是我最后要做的：
tree <- create_tree(40, children = 3, directed = TRUE) %>%
  mutate(parent = bfs_parent(),
         value = round(runif(40) * 100),
         name = row_number()) %>%
  activate(nodes) %>%
  left_join(
    tree %>%
      group_by(parent) %>%
      mutate(var = var(value)) %>% activate(nodes) %>% as_tibble() %>%
      group_by(parent) %>% summarize(child_stat = first(var)),
    by=c("name" = "parent")
  )

树%
突变（父项=bfs_父项（），
值=四舍五入（runif（40）*100），
名称=行号（））%>%
激活（节点）%>%
左联合(
树%>%
按（母公司）分组%>%
mutate（var=var（value））%%>%将（节点）%%>%激活为_tible（）%%>%
分组依据（父项）%>%汇总（子项统计=第一个（变量）），
by=c（“名称”=“父项”）
)

感觉不是很整洁，但似乎很有效。开放优化。
我尝试了一种“tidygraph”的做事方式。主要功能是计算值
列的方差：
calc\u child\u stats%变异（var=map\u local\u dbl（order=1，mode=“out”，.f=calc\u child\u stats））
#>#tbl_图：40个节点和39条边
#> #
#>#有根的树
#> #
#>#节点数据：40 x 2（活动）
#>价值变量
#>    
#> 1    29  34.3
#> 2    45 433  
#> 3    56 225. 
#> 4    47 868  
#> 5    78 604. 
#> 6    43 283  
#> # ... 还有34行
#> #
#>#边缘数据：39 x 2
#>从到
#>    
#> 1     1     2
#> 2     1     3
#> 3     1     4
#> # ... 还有36行

虽然我的tidygraph版本更为“Graphics”，但速度似乎不是很快，因此我在两种方法之间创建了一个快速的微基准测试：
库（微基准）
微基准（树%>%突变（var=map\u local\u dbl（order=1，mode=“out”，.f=calc\u child\u stats）））
#>单位：毫秒
#>expr
#>树%>%变异（var=map\u local\u dbl（order=1，mode=“out”，.f=calc\u child\u stats））
#>最小lq平均uq最大neval中值
#>  115.3325 123.0303 127.7889 126.6683 130.057 191.6065   100
microbenchmark（计算儿童统计数据）
#>单位：毫秒
#>expr最小lq平均中值uq
#>计算儿童统计数据（树）4.915917 5.213939 6.292579 5.573978 6.717745
#>马克斯·内瓦尔
#>  16.72846   100

由（v0.2.0）于2018年6月15日创建
当然，dplyr方法的速度要快得多，所以我现在还是坚持使用它。他们在我的测试中给出了相同的值
为完整起见，这是我复制op方法时使用的fxn：
calc\u child\u stats\u dplyr%激活（节点）%>%
左联合(
树%>%
按（母公司）分组%>%
变异（var=var（value））%>%
激活（节点）%>%
as_tible（）%>%
按（母公司）分组%>%
汇总（child_stat=first（var）），
by=c（“名称”=“父项”）
)
}
请说明如何发布一个容易回答的R问题。这包括您的数据样本和再现问题所需的所有代码。具体来说，不清楚您正在使用的软件包是用@camille对软件包的建议编辑的。您的“这就是我最后要做的”代码对我不起作用。我必须在加入之前拆分它。你的calc_child_stats_dplyr与OP的“这就是我最后要做的”代码相同吗？结果是一样的，我现在已经在我的帖子中添加了这个函数。
tree <- create_tree(40, children = 3, directed = TRUE) %>%
  mutate(parent = bfs_parent(),
         value = round(runif(40) * 100),
         name = row_number()) %>%
  activate(nodes) %>%
  left_join(
    tree %>%
      group_by(parent) %>%
      mutate(var = var(value)) %>% activate(nodes) %>% as_tibble() %>%
      group_by(parent) %>% summarize(child_stat = first(var)),
    by=c("name" = "parent")
  )