Neo4j 密码没有循环，没有双路径_Neo4j_Cypher

Neo4j 密码没有循环，没有双路径

neo4j

Neo4j 密码没有循环，没有双路径,neo4j,cypher,Neo4j,Cypher,我目前正在建模一个数据库，其中包含超过50000个节点，每个节点都有2个定向关系。我尝试获取一个输入节点（根节点）的所有节点，这些节点通过一个关系和这些节点的所有所谓子节点连接到它，以此类推，直到到达每个直接或间接连接到此根节点的节点 String query = "MATCH (m {title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo*..]->(n) " + "RETURN DISTINCT n.title A

我目前正在建模一个数据库，其中包含超过50000个节点，每个节点都有2个定向关系。我尝试获取一个输入节点（根节点）的所有节点，这些节点通过一个关系和这些节点的所有所谓子节点连接到它，以此类推，直到到达每个直接或间接连接到此根节点的节点

String query =
  "MATCH (m {title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo*..]->(n) " +
  "RETURN DISTINCT n.title AS Title, n.namespaceID " + 
  "ORDER BY n.title";

Result result = db.execute(query, params);
String infos = result.resultAsString();

我已经读到运行时更可能是O（n^x），但我找不到任何排除循环或一个节点的多条路径的命令，因此查询需要2个多小时，这对于我的用例来说是不可接受的。

对于简单关系表达式，Cypher通过强制执行以下命令自动排除多个关系：

在模式匹配时，Neo4j确保不包括在单个模式中多次发现相同图形关系的匹配

关于这是否适用于可变长度路径，文档并不完全清楚——因此，让我们设计一个小实验来证实这一点：

CREATE
  (n1:Node {name: "n1"}),
  (n2:Node {name: "n2"}),
  (n3:Node {name: "n3"}),
  (n4:Node {name: "n4"}),
  (n1)-[:REL]->(n2),
  (n2)-[:REL]->(n3),
  (n3)-[:REL]->(n2),
  (n2)-[:REL]->(n4)

这将导致以下图表：

查询：

MATCH (n:Node {name:"n1"})-[:REL*..]->(m)
RETURN m

结果是：

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n2}│
├──────────┤
│{name: n4}│
├──────────┤
│{name: n4}│
└──────────┘

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n4}│
└──────────┘

如您所见，

n4

被多次包含（因为它可以通过避免循环和穿过循环来访问）。使用

配置文件检查执行情况

：

因此，我们应该使用

DISTINCT

来消除重复项：

MATCH (n:Node {name:"n1"})-[:REL*..]->(m)
RETURN DISTINCT m

结果是：

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n2}│
├──────────┤
│{name: n4}│
├──────────┤
│{name: n4}│
└──────────┘

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n4}│
└──────────┘

再次使用

PROFILE

检查执行情况：

对于简单关系表达式，Cypher通过强制执行以下操作自动排除多个关系：

在模式匹配时，Neo4j确保不包括在单个模式中多次发现相同图形关系的匹配

关于这是否适用于可变长度路径，文档并不完全清楚——因此，让我们设计一个小实验来证实这一点：

CREATE
  (n1:Node {name: "n1"}),
  (n2:Node {name: "n2"}),
  (n3:Node {name: "n3"}),
  (n4:Node {name: "n4"}),
  (n1)-[:REL]->(n2),
  (n2)-[:REL]->(n3),
  (n3)-[:REL]->(n2),
  (n2)-[:REL]->(n4)

这将导致以下图表：

查询：

MATCH (n:Node {name:"n1"})-[:REL*..]->(m)
RETURN m

结果是：

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n2}│
├──────────┤
│{name: n4}│
├──────────┤
│{name: n4}│
└──────────┘

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n4}│
└──────────┘

如您所见，

n4

被多次包含（因为它可以通过避免循环和穿过循环来访问）。使用

配置文件检查执行情况

：

因此，我们应该使用

DISTINCT

来消除重复项：

MATCH (n:Node {name:"n1"})-[:REL*..]->(m)
RETURN DISTINCT m

结果是：

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n2}│
├──────────┤
│{name: n4}│
├──────────┤
│{name: n4}│
└──────────┘

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n4}│
└──────────┘

再次使用

PROFILE

检查执行情况：

我们当然可以做一些事情来改进这个查询

首先，您根本不使用标签。而且，由于您没有使用标签，输入节点上的匹配无法利用现有的任何架构索引，它必须扫描访问和比较属性的所有50k节点，直到找到具有给定标题和命名空间的每个节点（当找到一个节点时，它不会停止，因为它不知道是否有其他节点满足条件）。您可以通过仅在开始节点上进行匹配来检查计时

为了改进这一点，应该为节点添加标签，并且开始节点上的匹配项应该包含标签，并且应该为标题和名称空间ID属性编制索引

仅此一项就可以显著提高查询速度

下一个问题是，剩余的瓶颈是由于排序，还是由于返回了大量的结果集

您可以通过限制返回的结果来单独检查排序的成本

匹配后，您可以在查询结束时使用此选项

WITH DISTINCT n
ORDER BY n.title
LIMIT 10
RETURN n.title AS Title, n.namespaceID 
ORDER BY n.title

此外，在执行任何性能调整时，您应该分析您的查询（至少是那些在合理时间内完成的查询），并解释那些花了大量时间来检查查询计划的查询。

我们当然可以做一些事情来改进此查询

public static HashSet<Node> breadthFirst(String name, int namespace, GraphDatabaseService db) {

        // Hashmap for storing the cypher query and its parameters
        Map<String, Object> params = new HashMap<>();


        // Adding the title and namespaceID as parameters to the Hashmap
        params.put("title", name);
        params.put("namespaceID", namespace);

        /*it is a simple BFS with these variables below
         * basically a Queue (touched) as usual, a Set to store the nodes
         *  which have been used (finished), a return variable and 2 result
         *  variables for the queries
         */
        Node startNode = null;
        String query = "Match (n{title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo]-> (m) RETURN m";
        Queue<Node> touched = new LinkedList<Node>();
        HashSet<Node>finished = new HashSet<Node>();
        HashSet<Node> returnResult = new  HashSet<Node>();

        Result iniResult = null;
        Result tempResult=null;

        /*the part below get the direct nodes and puts them
         * into the queue
         */
            try (Transaction tx = db.beginTx()) {
                 iniResult =db.execute(query,params);


                while(iniResult.hasNext()){
                  Map<String,Object> iniNode=iniResult.next();
                  startNode=(Node) iniNode.get("m");
                  touched.add(startNode);
                  finished.add(startNode);
                }
                tx.success();
                }catch (QueryExecutionException e) {
                logger.error("Fehler bei Ausführung der Anfrage", e);  
                }

            /*and now we just execute the BFS (don't think i need more to
             * say here.. we are all pros ;))
             * as usual, marking every node we have visited
             * and saving every visited node.
             * the difficult part had been the casting from
             * and to node and result, everything else is pretty much
             * straightforward. I think the  variables explain their self
             * via their name....
             */

               while(! (touched.isEmpty())){
                 try (Transaction tx = db.beginTx()) {
                   Node currNode=touched.poll();
                   returnResult.add(currNode);

                   tempResult=null;               
                   Map<String, Object> paramsTemp = new HashMap<>();
                   paramsTemp.put("title",currNode.getProperty("title").toString());
                   paramsTemp.put("namespaceID", 14);
                   String tempQuery = "MATCH (n{title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo] -> (m) RETURN m";
                   tempResult =  db.execute(tempQuery,paramsTemp);



                 while(tempResult.hasNext()){
                     Map<String, Object> currResult= null;
                     currResult=tempResult.next();
                     Node tempCurrNode = (Node) currResult.get("m");

                     if (!finished.contains(tempCurrNode)){
                         touched.add(tempCurrNode);
                         finished.add(tempCurrNode);


                     }


                  }


               tx.success();  
            }catch (QueryExecutionException f) {
                logger.error("Fehler bei Ausführung der Anfrage", f);
            }
        }       


        return returnResult;

}

首先，您根本没有使用标签。而且，由于您没有使用标签，输入节点上的匹配无法利用现有的任何架构索引，它必须扫描访问和比较属性的所有50k节点，直到找到具有给定标题和命名空间的每个节点（当找到一个节点时，它不会停止，因为它不知道是否有其他节点满足条件）。您可以通过仅在开始节点上进行匹配来检查计时

为了改进这一点，应该为节点添加标签，并且开始节点上的匹配项应该包含标签，并且应该为标题和名称空间ID属性编制索引

仅此一项就可以显著提高查询速度

下一个问题是，剩余的瓶颈是由于排序，还是由于返回了大量的结果集

您可以通过限制返回的结果来单独检查排序的成本

匹配后，您可以在查询结束时使用此选项

WITH DISTINCT n
ORDER BY n.title
LIMIT 10
RETURN n.title AS Title, n.namespaceID 
ORDER BY n.title

此外，在执行任何性能调优时，您应该分析查询（至少是在合理时间内完成的查询），并解释那些花费了大量时间来检查查询计划的查询。

公共静态哈希集breadthFirst（字符串名、int名称空间、GraphDatabaseService db）{
public static HashSet<Node> breadthFirst(String name, int namespace, GraphDatabaseService db) {

        // Hashmap for storing the cypher query and its parameters
        Map<String, Object> params = new HashMap<>();


        // Adding the title and namespaceID as parameters to the Hashmap
        params.put("title", name);
        params.put("namespaceID", namespace);

        /*it is a simple BFS with these variables below
         * basically a Queue (touched) as usual, a Set to store the nodes
         *  which have been used (finished), a return variable and 2 result
         *  variables for the queries
         */
        Node startNode = null;
        String query = "Match (n{title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo]-> (m) RETURN m";
        Queue<Node> touched = new LinkedList<Node>();
        HashSet<Node>finished = new HashSet<Node>();
        HashSet<Node> returnResult = new  HashSet<Node>();

        Result iniResult = null;
        Result tempResult=null;

        /*the part below get the direct nodes and puts them
         * into the queue
         */
            try (Transaction tx = db.beginTx()) {
                 iniResult =db.execute(query,params);


                while(iniResult.hasNext()){
                  Map<String,Object> iniNode=iniResult.next();
                  startNode=(Node) iniNode.get("m");
                  touched.add(startNode);
                  finished.add(startNode);
                }
                tx.success();
                }catch (QueryExecutionException e) {
                logger.error("Fehler bei Ausführung der Anfrage", e);  
                }

            /*and now we just execute the BFS (don't think i need more to
             * say here.. we are all pros ;))
             * as usual, marking every node we have visited
             * and saving every visited node.
             * the difficult part had been the casting from
             * and to node and result, everything else is pretty much
             * straightforward. I think the  variables explain their self
             * via their name....
             */

               while(! (touched.isEmpty())){
                 try (Transaction tx = db.beginTx()) {
                   Node currNode=touched.poll();
                   returnResult.add(currNode);

                   tempResult=null;               
                   Map<String, Object> paramsTemp = new HashMap<>();
                   paramsTemp.put("title",currNode.getProperty("title").toString());
                   paramsTemp.put("namespaceID", 14);
                   String tempQuery = "MATCH (n{title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo] -> (m) RETURN m";
                   tempResult =  db.execute(tempQuery,paramsTemp);



                 while(tempResult.hasNext()){
                     Map<String, Object> currResult= null;
                     currResult=tempResult.next();
                     Node tempCurrNode = (Node) currResult.get("m");

                     if (!finished.contains(tempCurrNode)){
                         touched.add(tempCurrNode);
                         finished.add(tempCurrNode);


                     }


                  }


               tx.success();  
            }catch (QueryExecutionException f) {
                logger.error("Fehler bei Ausführung der Anfrage", f);
            }
        }       


        return returnResult;

}   

//用于存储cypher查询及其参数的Hashmap
Map params=新的HashMap（）；
//将标题和命名空间ID作为参数添加到Hashmap
参数put（“标题”，名称）；
参数put（“namespaceID”，名称空间）；
/*这是一个具有以下变量的简单BFS
*基本上是一个队列（触摸式），一个用于存储节点的集合
*已使用（已完成）、一个返回变量和2个结果
*查询的变量
*/
节点startNode=null；
String query=“匹配（n{title:{title}，namespaceID:{namespaceID}}）-[：categorieLinkTo]->（m）返回m”；
Queue toucted=新建LinkedList（）；
HashSetfinished=新的HashSet（）；
HashSet returnResult=新HashSet（）；
结果INIRESS=null；
结果tempResult=null；
/*下面的部分获取直接节点并将