文章/答案/技术大牛

发布

社区首页 >问答首页 >Tidygraph:在父级计算子摘要

问Tidygraph:在父级计算子摘要
EN

Stack Overflow用户

提问于 2018-06-13 15:18:32

回答 1查看 449关注 0票数 1

用R中的位图包，给出一棵树，我想计算平均值，和，方差……树中每个节点的每个直接子节点的值。

我的直觉是使用map_bfs_back_dbl或关联，并尝试修改帮助示例，但被困住了。

library(tidygraph)

# Collect values from children
create_tree(40, children = 3, directed = TRUE) %>%
  mutate(value = round(runif(40)*100)) %>%
  mutate(child_acc = map_bfs_back_dbl(node_is_root(), .f = function(node, path, ...) {
    if (nrow(path) == 0) .N()$value[node]
    else {
      sum(unlist(path$result[path$parent == node]))
    }
  }))

对于上述情况，我想要树中每个父母的所有直接的、第一级的孩子的平均value。

更新：：我尝试过这种方法(计算子属性的方差)：

library(tidygraph)
create_tree(40, children = 3, directed = TRUE) %>%
  mutate(parent = bfs_parent(),
         value = round(runif(40)*100)) %>% 
  group_by(parent) %>%
  mutate(var = var(value))

这是非常接近的：

# Node Data: 40 x 3 (active)
# Groups:    parent [14]
  parent value   var
*  <int> <dbl> <dbl>
1     NA  2.00    NA
2      1 13.0   1393
3      1 63.0   1393
4      1 86.0   1393
5      2 27.0    890
6      2 76.0    890
# ... with 34 more rows

我想看到的是：

# Node Data: 40 x 3 (active)
# Groups:    parent [14]
  parent value   var  child_var
*  <int> <dbl> <dbl>      <dbl>
1     NA  2.00    NA       1393
2      1 13.0   1393        890 
3      1 63.0   1393       (etc)
4      1 86.0   1393
5      2 27.0    890
6      2 76.0    890
# ... with 34 more rows

它将(第一个) "var“值移动到由”父“值标识的节点。帮助?有什么建议吗？

编辑:这就是我最后要做的事情：

tree <- create_tree(40, children = 3, directed = TRUE) %>%
  mutate(parent = bfs_parent(),
         value = round(runif(40) * 100),
         name = row_number()) %>%
  activate(nodes) %>%
  left_join(
    tree %>%
      group_by(parent) %>%
      mutate(var = var(value)) %>% activate(nodes) %>% as_tibble() %>%
      group_by(parent) %>% summarize(child_stat = first(var)),
    by=c("name" = "parent")
  )

感觉不太好，但似乎很管用。对优化开放。

tidygraph

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-06-15 16:25:55

我在这里尝试了一种“摄影”的方法。主要功能是计算value列的方差：

calc_child_stats <- function(neighborhood, ...){
  ## By default the neighborhood includes the parent and all of it's children
  ## First remove the parent, then run analysis
  neighborhood %>% activate(nodes) %>% 
    slice(-1) %>% 
    select(value) %>% 
    pull %>% 
    var
}

有了这个函数之后，就可以简单地调用map_local而不是map_bfs了，因为您正在尝试：

tree <- create_tree(40, children = 3, directed = TRUE) %>%
  mutate(value = round(runif(40)*100))

tree %>% mutate(var = map_local_dbl(order = 1, mode="out", .f = calc_child_stats))
#> # A tbl_graph: 40 nodes and 39 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 40 x 2 (active)
#>   value   var
#>   <dbl> <dbl>
#> 1    29  34.3
#> 2    45 433  
#> 3    56 225. 
#> 4    47 868  
#> 5    78 604. 
#> 6    43 283  
#> # ... with 34 more rows
#> #
#> # Edge Data: 39 x 2
#>    from    to
#>   <int> <int>
#> 1     1     2
#> 2     1     3
#> 3     1     4
#> # ... with 36 more rows

虽然我的贴图版本更多的是“图表”，但它看起来并不快，所以我在这两种方法之间创建了一个快速的微基准测试：

library(microbenchmark)
microbenchmark(tree %>% mutate(var = map_local_dbl(order = 1, mode="out", .f = calc_child_stats)))
#> Unit: milliseconds
#>                                                                                       expr
#>  tree %>% mutate(var = map_local_dbl(order = 1, mode = "out",      .f = calc_child_stats))
#>       min       lq     mean   median      uq      max neval
#>  115.3325 123.0303 127.7889 126.6683 130.057 191.6065   100
microbenchmark(calc_child_stats_dplyr(tree))
#> Unit: milliseconds
#>                          expr      min       lq     mean   median       uq
#>  calc_child_stats_dplyr(tree) 4.915917 5.213939 6.292579 5.573978 6.717745
#>       max neval
#>  16.72846   100

由reprex封装创建于2018-06-15 (v0.2.0)。

当然，dplyr的方式要快得多，所以我现在就坚持这样做。他们在我的测试中都给出了相同的值。

为了完整起见，我使用的是复制op方法的fxn：

calc_child_stats_dplyr <- function(tree){
  tree <- tree %>%
    mutate(parent = bfs_parent(),
           name = row_number())

  tree %>% activate(nodes) %>%
    left_join(
      tree %>%
        group_by(parent) %>%
        mutate(var = var(value)) %>% 
        activate(nodes) %>% 
        as_tibble() %>%
        group_by(parent) %>% 
        summarize(child_stat = first(var)),
      by=c("name" = "parent")
    )
}

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50840808

复制

相似问题

问Tidygraph:在父级计算子摘要
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tidygraph:在父级计算子摘要EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tidygraph:在父级计算子摘要
EN