Java 从平面数据中查找树结构的所有子体

Java 从平面数据中查找树结构的所有子体,java,algorithm,Java,Algorithm,我有一个表示层次关系的平面数据,如下所示: ID Name PID 0 A NULL 1 B 0 2 C 0 4 D 1 5 E 1 6 F 4 3 G 0 此表表示“数据表”,其中PID表示父元素。 例如,在第一行中,我们看到A有PID null,而B有PID 0,这意味着B的父元素是A,因为0是A的ID,A是根元素,因为它没有PID。类似地,C有父A,因为C也有PID 0

我有一个表示层次关系的平面数据,如下所示:

ID  Name    PID
0   A       NULL
1   B       0
2   C       0
4   D       1
5   E       1
6   F       4
3   G       0
此表表示“数据表”,其中PID表示父元素。 例如,在第一行中,我们看到A有PID null,而B有PID 0,这意味着B的父元素是A,因为0是A的ID,A是根元素,因为它没有PID。类似地,C有父A,因为C也有PID 0,0是A的ID

我创建了一个类DataTable来表示上面的表。我还实现了processDataTable方法

public Map<String, List<String>> processDataTable()
下面是我对DataTable的实现:

public class DataTable {

    private List<Record> records = new ArrayList<>();
    private Map<Integer, Integer> indexes = new HashMap<>();
    private static final int PROCESSORS = Runtime.getRuntime().availableProcessors();

    /**
     * Add new record into DataTable.
     * 
     * @param id
     * @param name
     * @param parentId
     */
    public void addRow(Integer id, String name, Integer parentId) {
        if (indexes.get(id) == null) {
            Record rec = new Record(id, name, parentId);
            records.add(rec);
            indexes.put(id, records.size() - 1);
        }
    }

    public List<Record> getRecords() {
       return records;
    }

    /**
     * Process DataTable and return a Map of all keys and its children. The
     * main algorithm here is to divide big record set into multiple parts, compute
     * on multi threads and then merge all result together.
     * 
     * @return
     */
    public Map<String, List<String>> processDataTable() {
       long start = System.currentTimeMillis(); 
       int size = size();

       // Step 1: Link all nodes together
       invokeOnewayTask(new LinkRecordTask(this, 0, size));

       Map<String, List<String>> map = new ConcurrentHashMap<>();

       // Step 2: Get result
       invokeOnewayTask(new BuildChildrenMapTask(this, 0, size, map));

       long elapsedTime = System.currentTimeMillis() - start;

       System.out.println("Total elapsed time: " + elapsedTime + " ms");

       return map;
    }

    /**
     * Invoke given task one way and measure the time to execute.
     * 
     * @param task
     */
    private void invokeOnewayTask(ForkJoinTask<?> task) {
        long start = System.currentTimeMillis();
        ForkJoinPool pool = new ForkJoinPool(PROCESSORS);
        pool.invoke(task);
        long elapsedTime = System.currentTimeMillis() - start;
        System.out.println(task.getClass().getSimpleName() + ":" + elapsedTime + " ms");
    }

    /**
     * Find record by id.
     * 
     * @param id
     * @return
     */
    public Record getRecordById(Integer id) {
        Integer pos = indexes.get(id);
        if (pos != null) {
            return records.get(pos);
        }
        return null;
    }

    /**
     * Find record by row number.
     * 
     * @param rownum
     * @return
     */
    public Record getRecordByRowNumber(Integer rownum) {
       return (rownum < 0 || rownum > records.size() - 1) ? null:records.get(rownum);
    }

    public int size() {
       return records.size();
    }

    /**
     * A task link between nodes
     */
    private static class LinkRecordTask extends RecursiveAction {

    private static final long serialVersionUID = 1L;
    private DataTable dt;
    private int start;
    private int end;
    private int limit = 100;

    public LinkRecordTask(DataTable dt, int start, int end) {
        this.dt = dt;
        this.start = start;
        this.end = end;
    }

    @Override
    protected void compute() {
        if ((end - start) < limit) {
        for (int i = start; i < end; i++) {
            Record r = dt.records.get(i);
            Record parent = dt.getRecordById(r.parentId);
            r.parent = parent;
            if(parent != null) {
               parent.children.add(r);
            }
        }
        } else {
           int mid = (start + end) / 2;
           LinkRecordTask left = new LinkRecordTask(dt, start, mid);
           LinkRecordTask right = new LinkRecordTask(dt, mid, end);
           left.fork();
           right.fork();
           left.join();
           right.join();
        }
    }

    }

    /**
     * Build Map<String, List<String>> result from given DataTable.
     */
    private static class BuildChildrenMapTask extends RecursiveAction {

        private static final long serialVersionUID = 1L;
        private DataTable dt;
        private int start;
        private int end;
        private int limit = 100;
        private Map<String, List<String>> map;

        public BuildChildrenMapTask(DataTable dt, int start, int end, Map<String, List<String>> map) {
            this.dt = dt;
            this.start = start;
            this.end = end;
            this.map = map;
        }

        @Override
        protected void compute() {
            if ((end - start) < limit) {
               computeDirectly();
            } else {
                int mid = (start + end) / 2;
                BuildChildrenMapTask left = new BuildChildrenMapTask(dt, start, mid, map);
                BuildChildrenMapTask right = new BuildChildrenMapTask(dt, mid, end, map);
                left.fork();
                right.fork();
                left.join();
                right.join();
           }
        }

        private void computeDirectly() {  
            for (int i = start; i < end; i++) {
                Record rec = dt.records.get(i);
                List<String> names = new ArrayList<String>();

                loadDeeplyChildNodes(rec, names);

                if(!names.isEmpty()) {
                    map.put(rec.name, names);
                }
            }
        }

        private void loadDeeplyChildNodes(Record r, List<String> names) {
             Collection<Record> children = r.children;
             for(Record rec:children) {
                if(!names.contains(rec.name)) {
                   names.add(rec.name);
                }
                loadDeeplyChildNodes(rec, names);
             }
        }

    }

}
我不知道这个实现有什么问题。谁能给我一些建议吗?此实现在案例线性层次结构5K记录(项目1是项目2的根和父项,项目2是项目3的父项,项目3是项目4的父项,…依此类推)上获得了OutOfmemory错误。因为它多次调用递归方法,所以无法使用内存


解决这个问题的好算法是什么,或者我应该修改哪种数据结构以使其更好?

您似乎已经陷入了编写超出所需代码数量的诱惑。根据您的数据,我们可以编写一个简单的树结构,让您进行祖先和后代搜索:

import java.util.HashMap;
import java.util.ArrayList;

class Node {
  // static lookup table, because we *could* try to find nodes by walking
  // the node tree, but the ids are uniquely identifying: this way we can
  // do an instant lookup. Efficiency!
  static HashMap<Long, Node> NodeLUT = new HashMap<Long, Node>();

  // we could use Node.NodeLUT.get(...), but having a Node.getNode(...) is nicer
  public static Node getNode(long id) {
    return Node.NodeLUT.get(id);
  }

  // we don't call the Node constructor directly, we just let this factory
  // take care of that for us instead.
  public static Node create(long _id, String _label) {
    return new Node(_id, _label);
  }

  public static Node create(long _id, String _label, long _parent) {
    Node parent = Node.NodeLUT.get(_parent), node;
    node = new Node(_id, _label);
    parent.addChild(node);
    return node;
  }

  // instance variables and methods

  Node parent;
  long id;
  String label;
  ArrayList<Node> children = new ArrayList<Node>();

  // again: no public constructor. We can only use Node.create if we want
  // to make Node objects.
  private Node(long _id, String _label) {
    parent = null;
    id = _id;
    label = _label;
    Node.NodeLUT.put(id, this);
  }

  // this is taken care of in Node.create, too
  private void addChild(Node child) {
    children.add(child);
    child.setParent(this);
  }

  // as is this.
  private void setParent(Node _parent) {
    parent = _parent;
  }

  /**
   * Find the route from this node, to some descendant node with id [descendentId]
   */
  public ArrayList<Node> getDescendentPathTo(long descendentId) {
    ArrayList<Node> list = new ArrayList<Node>(), temp;
    list.add(this);
    if(id == descendentId) {
      return list;
    }
    for(Node n: children) {
      temp = n.getDescendentPathTo(descendentId);
      if(temp != null) {
        list.addAll(temp);
        return list;
      }
    }
    return null;
  }

  /**
   * Find the route from this node, to some ancestral node with id [descendentId]
   */
  public ArrayList<Node> getAncestorPathTo(long ancestorId) {
    ArrayList<Node> list = new ArrayList<Node>(), temp;
    list.add(this);
    if(id == ancestorId) {
      return list;
    }
    temp = parent.getAncestorPathTo(ancestorId);
    if(temp != null) {
      list.addAll(temp);
      return list;
    }
    return null;
  }

  public String toString() {
    return "{id:"+id+",label:"+label+"}";
  }
}
输出

From root to F: {id:0,label:A}, {id:1,label:B}, {id:4,label:D}, {id:6,label:F}
From F to root: {id:6,label:F}, {id:4,label:D}, {id:1,label:B}, {id:0,label:A}
太好了


因此,我们所需要做的就是编写将“平面定义”转换为
节点的部分。创建
调用,然后完成。记住:不要把事情复杂化。如果您的数据是一个平面树,那么您只需要一个树结构。而编写树结构所需的只是一个节点类。

您似乎已经陷入了编写超出所需代码的诱惑。根据您的数据,我们可以编写一个简单的树结构,让您进行祖先和后代搜索:

import java.util.HashMap;
import java.util.ArrayList;

class Node {
  // static lookup table, because we *could* try to find nodes by walking
  // the node tree, but the ids are uniquely identifying: this way we can
  // do an instant lookup. Efficiency!
  static HashMap<Long, Node> NodeLUT = new HashMap<Long, Node>();

  // we could use Node.NodeLUT.get(...), but having a Node.getNode(...) is nicer
  public static Node getNode(long id) {
    return Node.NodeLUT.get(id);
  }

  // we don't call the Node constructor directly, we just let this factory
  // take care of that for us instead.
  public static Node create(long _id, String _label) {
    return new Node(_id, _label);
  }

  public static Node create(long _id, String _label, long _parent) {
    Node parent = Node.NodeLUT.get(_parent), node;
    node = new Node(_id, _label);
    parent.addChild(node);
    return node;
  }

  // instance variables and methods

  Node parent;
  long id;
  String label;
  ArrayList<Node> children = new ArrayList<Node>();

  // again: no public constructor. We can only use Node.create if we want
  // to make Node objects.
  private Node(long _id, String _label) {
    parent = null;
    id = _id;
    label = _label;
    Node.NodeLUT.put(id, this);
  }

  // this is taken care of in Node.create, too
  private void addChild(Node child) {
    children.add(child);
    child.setParent(this);
  }

  // as is this.
  private void setParent(Node _parent) {
    parent = _parent;
  }

  /**
   * Find the route from this node, to some descendant node with id [descendentId]
   */
  public ArrayList<Node> getDescendentPathTo(long descendentId) {
    ArrayList<Node> list = new ArrayList<Node>(), temp;
    list.add(this);
    if(id == descendentId) {
      return list;
    }
    for(Node n: children) {
      temp = n.getDescendentPathTo(descendentId);
      if(temp != null) {
        list.addAll(temp);
        return list;
      }
    }
    return null;
  }

  /**
   * Find the route from this node, to some ancestral node with id [descendentId]
   */
  public ArrayList<Node> getAncestorPathTo(long ancestorId) {
    ArrayList<Node> list = new ArrayList<Node>(), temp;
    list.add(this);
    if(id == ancestorId) {
      return list;
    }
    temp = parent.getAncestorPathTo(ancestorId);
    if(temp != null) {
      list.addAll(temp);
      return list;
    }
    return null;
  }

  public String toString() {
    return "{id:"+id+",label:"+label+"}";
  }
}
输出

From root to F: {id:0,label:A}, {id:1,label:B}, {id:4,label:D}, {id:6,label:F}
From F to root: {id:6,label:F}, {id:4,label:D}, {id:1,label:B}, {id:0,label:A}
太好了



因此,我们所需要做的就是编写将“平面定义”转换为
节点的部分。创建
调用,然后完成。记住:不要把事情复杂化。如果您的数据是一个平面树,那么您只需要一个树结构。编写树结构所需的只是一个节点类。

在LinkChildrenTask的compute()中,为什么要将节点添加到每个祖先的子列表中?您是否打算将节点的每个子节点和孙子节点添加到该节点的子节点列表中,以便快速了解它是否位于其子树中的某个位置?这可能会占用大量内存。在您的示例中,大约1250万个引用。第一次实现时,我只直接添加每个节点的子节点,而不是所有子节点。但在BuildChildrenMapTask中,它必须递归多次才能找到所有嵌套的子级,而且在该任务中也失败了。在这个测试用例中,每个节点都有很多子节点:(:(:)这看起来非常不同。它在loadDeeplyChildNodes()中吗您收到错误?考虑到您的根节点将有5000个调用深度的堆栈,根节点的子节点将有4999个调用深度的堆栈,等等,这是有意义的。它运行时间太长,之后会抛出错误GC Over limit或OutOfmemory异常。使用此实现,它运行非常缓慢:(:)(并抛出异常,本例为tooThe错误:线程“main”java.lang.OutOfMemoryError中的异常:超出LinkChildrenTask的compute()中的GC开销限制),为什么要将一个节点添加到每个祖先的子节点列表中?是否打算将节点的每个子节点和孙子节点添加到该节点的子节点列表中,以便快速知道它是否位于子树中?这可能会使用非常大的内存量。在您的示例中,大约1250万个引用。第一次实现时,我只直接将每个节点的子节点,而不是所有子节点。但在BuildChildrenMapTask中,它必须递归多次才能找到所有嵌套的子节点,并且在该任务中也失败。在这个测试用例中,每个节点都有这么多子节点:(:(:(这看起来明显不同。是否在loadDeeplyChildNodes()中)您收到错误?考虑到您的根节点将有5000个调用深度的堆栈,根节点的子节点将有4999个调用深度的堆栈,等等,这是有意义的。它运行时间太长,之后会抛出错误GC Over limit或OutOfmemory异常。使用此实现,它运行非常缓慢:(:)(并抛出异常,在这种情况下为TOOTE错误:线程“main”中的异常)java.lang.OutOfMemoryError:GC开销限制超出了插入的顺序并非总是先插入父节点,然后再插入子节点,它可以是随机的。因此,我们是否需要另一个任务来链接每个节点的所有父节点,如上面的LinkRecordtask?如果可以,请对平面数据进行排序,以确保在插入子节点之前始终存在父节点。如果不能这是不可能的,那么您可以通过更改
create
工厂来检查
parent
,使父绑定成为“后期绑定”,如果它不存在,则调用a(您新定义的)第二个私有构造函数,将父级ID存储为
long\u parentid
或其他内容。然后可以创建一个静态的
resolveParents
,该构造函数可以执行真正的链接,并且可以调用
节点。resolveParents()
插入所有节点后。在不知道端点ID的情况下,如何在线性层次结构的情况下有效地加载节点的所有子节点?我将在节点中添加一个变量,以引用每个节点最深的子节点。这是否更好?您可以这样做,但在插入时必须更新它(当然完全可能)。插入的顺序并不总是先是父节点,然后是其子节点,它可以是随机的。因此,我们是否需要另一个任务来链接每个节点的所有父节点,如上面的LinkRecordtask?如果可以,请对平面数据进行排序,以确保在插入子节点之前始终存在父节点。如果不可能,则可以进行父节点绑定通过更改
创建
工厂以检查<
  public static String stringify(ArrayList<?> list) {
    String listString = "";
    for (int s=0, l=list.size(); s<l; s++) {
      listString += list.get(s).toString();
      if(s<l-1) { listString += ", "; }
    }
    return listString;
  }

  public static void main(String[] args) {
    // hard coded data based on your question-supplied example data
    Node.create(0, "A");
    Node.create(1, "B", 0);
    Node.create(2, "C", 0);
    Node.create(4, "D", 1);
    Node.create(5, "E", 1);
    Node.create(6, "F", 4);
    Node.create(3, "G", 0);

    // let's see what we get!
    Node root = Node.getNode(0);
    Node f = Node.getNode(6);
    System.out.println("From root to F: " + stringify(root.getDescendentPathTo(6)));
    System.out.println("From F to root: " + stringify(f.getAncestorPathTo(0)));
  }
From root to F: {id:0,label:A}, {id:1,label:B}, {id:4,label:D}, {id:6,label:F}
From F to root: {id:6,label:F}, {id:4,label:D}, {id:1,label:B}, {id:0,label:A}