如何将data.frame解析为树?

如何将data.frame解析为树?,r,parsing,tree,R,Parsing,Tree,下面是一个简单的分类法(标签和ID): 也许不是最有效的,但也不是太难: 创建数据: test_data <- data.frame( cat_id = c(661, 197, 228, 650, 126, 912, 949, 428), cat_h1 = c(rep("Animals", 5), rep("Plants", 3)), cat_h2 = c(rep("Mammals", 3), rep("Birds", 2), c("Wheat", "Grass", "Othe

下面是一个简单的分类法(标签和ID):


也许不是最有效的,但也不是太难:

创建数据:

test_data <- data.frame(
  cat_id = c(661, 197, 228, 650, 126, 912, 949, 428),
  cat_h1 = c(rep("Animals", 5), rep("Plants", 3)),
  cat_h2 = c(rep("Mammals", 3), rep("Birds", 2), c("Wheat", "Grass", "Other")),
  cat_h3 = c("Dogs", "Dogs", "Other", "Hawks", "Other", rep(NA, 3)),
  cat_h4 = c("Big", "Little", rep(NA, 6)))

test_data我会避免使用列表结构而不是整洁的数据。下面是一种减少数据冗余的方法

library(dplyr)

h1_h2 = 
  test_data %>%
  select(cat_h1, cat_h2) %>%
  distinct %>%
  filter(cat_h2 %>% is.na %>% `!`)

h2_h3 =
  test_data %>%
  select(cat_h2, cat_h3) %>%
  distinct %>%
  filter(cat_h3 %>% is.na %>% `!`)

h3_h4 = 
  test_data %>%
  select(cat_h3, cat_h4) %>%
  distinct %>%
  filter(cat_h4 %>% is.na %>% `!`)
原稿可以很容易地重新组合:

h1_h2 %>%
  left_join(h2_h3) %>%
  left_join(h3_h4)
编辑:这里有一种自动化整个过程的方法

library(dplyr)
library(lazyeval)

adjacency = function(data) {
  adjacency_table = function(data, larger_name, smaller_name)
    lazy(data %>%
           select(larger_name, smaller_name) %>%
           distinct %>%
           filter(smaller_name %>% is.na %>% `!`) ) %>%
    interp(larger_name = larger_name %>% as.name, 
           smaller_name = smaller_name %>% as.name) %>%
    lazy_eval %>%
    setNames(c("larger", "smaller"))

  data_frame(smaller_name = data %>% names) %>%
    mutate(larger_name = smaller_name %>% lag) %>%
    slice(-1) %>%
    group_by(larger_name, smaller_name) %>%
    do(adjacency_table(data, .$larger_name, .$smaller_name) )
}

result = 
  test_data %>%
  select(-cat_id) %>%
  adjacency

如果您对顺序的轻微更改感到满意,则这是一个按列处理的递归解决方案:

f <- function(x, d=cbind(x,NA)) {
    c( 
       # call f by branch
       if(ncol(d) > 3) local({
         x <- d[!is.na(d[[3]]),] 
         by( x[-2], droplevels(x[2]), f, x=NA, simplify=FALSE) 
       }), 
       # leaf nodes
       setNames(as.list(d[[1]]), d[[2]])[is.na(d[[3]])] 
    )
}

但这根本不是OP想要的。我可以理解“这不是一个很好的方法,这更好”,但这似乎与主题有很大的出入…@BenBolker这在技术上是离题的,但实际上(碰巧?)预见到了我的迫切需要,即以邻接列表形式重新表示树(与原始的“列沿袭”形式相反)!我可以看到这是通用的,用“lappy”包装,然后通过管道连接到“bind_rows”。也许离“减少”只有一步之遥。但是---这在OP中没有体现---如果有两个或多个节点具有相同的标签(但从根开始的路径不同),则可能会出现歧义/冲突的问题。我使用了一个新的自动版本进行编辑。是的,可能存在歧义。但是,如果确实是这样,两个或多个节点可以具有相同的标签但路径不同,那么原始表中实际上没有冗余,可以保持原样。我觉得必须有一个解决方案,使用
Reduce()
split()
,但我就是不明白。@time+1表示指向“data.tree”包的指针。谢谢美好的我使用类似的
by/split
逻辑得到的最接近的结果是
with(test_data,Map(split,split(cat_id,cat_h1),split(cat_h2,cat_h1))
在它崩溃之前。顺序不重要!递归是可以的。非常感谢你!
library(dplyr)

h1_h2 = 
  test_data %>%
  select(cat_h1, cat_h2) %>%
  distinct %>%
  filter(cat_h2 %>% is.na %>% `!`)

h2_h3 =
  test_data %>%
  select(cat_h2, cat_h3) %>%
  distinct %>%
  filter(cat_h3 %>% is.na %>% `!`)

h3_h4 = 
  test_data %>%
  select(cat_h3, cat_h4) %>%
  distinct %>%
  filter(cat_h4 %>% is.na %>% `!`)
h1_h2 %>%
  left_join(h2_h3) %>%
  left_join(h3_h4)
library(dplyr)
library(lazyeval)

adjacency = function(data) {
  adjacency_table = function(data, larger_name, smaller_name)
    lazy(data %>%
           select(larger_name, smaller_name) %>%
           distinct %>%
           filter(smaller_name %>% is.na %>% `!`) ) %>%
    interp(larger_name = larger_name %>% as.name, 
           smaller_name = smaller_name %>% as.name) %>%
    lazy_eval %>%
    setNames(c("larger", "smaller"))

  data_frame(smaller_name = data %>% names) %>%
    mutate(larger_name = smaller_name %>% lag) %>%
    slice(-1) %>%
    group_by(larger_name, smaller_name) %>%
    do(adjacency_table(data, .$larger_name, .$smaller_name) )
}

result = 
  test_data %>%
  select(-cat_id) %>%
  adjacency
f <- function(x, d=cbind(x,NA)) {
    c( 
       # call f by branch
       if(ncol(d) > 3) local({
         x <- d[!is.na(d[[3]]),] 
         by( x[-2], droplevels(x[2]), f, x=NA, simplify=FALSE) 
       }), 
       # leaf nodes
       setNames(as.list(d[[1]]), d[[2]])[is.na(d[[3]])] 
    )
}
> str(f(test_data))
List of 2
 $ Animals:List of 2
  ..$ Birds  :List of 2
  .. ..$ Hawks: num 650
  .. ..$ Other: num 126
  ..$ Mammals:List of 2
  .. ..$ Dogs :List of 2
  .. .. ..$ Big   : num 661
  .. .. ..$ Little: num 197
  .. ..$ Other: num 228
 $ Plants :List of 3
  ..$ Wheat: num 912
  ..$ Grass: num 949
  ..$ Other: num 428