Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/visual-studio-2012/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Neo4j 一种快速求大型图所有连通节点的唯一性的方法_Neo4j - Fatal编程技术网

Neo4j 一种快速求大型图所有连通节点的唯一性的方法

Neo4j 一种快速求大型图所有连通节点的唯一性的方法,neo4j,Neo4j,我的Neo4J数据库目前有1300多万个节点,其中包含的边也一样多。简化结构如下(省略大多数边缘类型) 我想得到所有相互连接的用户ID,不管路径的长度如何。这样我可以 首先,我使用Neo4Js HTTP API进行了如下的密码查询 MATCH (u:User {uid: '12345'})-[*1..]-(otherUser) RETURN DISTINCT otherUser 使用无上限的变长模式匹配,尤其是无限制的模式匹配,速度非常慢 所以我四处寻找,找到了APOC库及其expandCo

我的Neo4J数据库目前有1300多万个节点,其中包含的边也一样多。简化结构如下(省略大多数边缘类型)

我想得到所有相互连接的用户ID,不管路径的长度如何。这样我可以

首先,我使用Neo4Js HTTP API进行了如下的密码查询

MATCH (u:User {uid: '12345'})-[*1..]-(otherUser) 
RETURN DISTINCT otherUser
使用无上限的变长模式匹配,尤其是无限制的模式匹配,速度非常慢

所以我四处寻找,找到了APOC库及其
expandConfig
方法

MATCH (u:User {uid: '12345'})
CALL apoc.path.expandConfig(c, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
// Extracting the 'uid' property
RETURN extract(
  n IN (
    // We only want 'User' nodes
    filter (
      x IN NODES(path) WHERE 'User' IN labels(x)
    )
  ) | n.uid
) as uid
这就像一个符咒,在大多数情况下会在几毫秒内返回所有节点

当查询一个我知道他“非常好”连接的用户(24k个节点,40k个边)时,需要将近30秒的时间

示例响应

{
  "results": [
    {
      "columns": [
        "uid"
      ],
      "data": [
        {"row": [["9974"]], "meta": [null]},
        {"row": [["9974"]], "meta": [null]},
        {"row": [["9974"]], "meta": [null]},
        {"row": [["9974","14367"] ],"meta": [null,null]},
        {"row": [["9974","11820"] ],"meta": [null,null]},
        {"row": [["9974","11821"] ],"meta": [null,null]},
        {"row": [["9974","11822"] ],"meta": [null,null]},
        {"row": [["9974","11823"] ],"meta": [null,null]},
        {"row": [["9974","9314"] ],"meta": [null,null]},
        {"row": [["9974","9313"] ],"meta": [null,null]},
        {"row": [["9974","9317"] ],"meta": [null,null]},
        {"row": [["9974","14367"] ],"meta": [null,null]},
        {"row": [["9974","11820"] ],"meta": [null,null]},
        {"row": [["9974","11821"] ],"meta": [null,null]},
        {"row": [["9974","11822"] ],"meta": [null,null]},
        {"row": [["9974","11823"] ],"meta": [null,null]},
        {"row": [["9974","9314"] ],"meta": [null,null]},
        {"row": [["9974","9313"] ],"meta": [null,null]},
        {"row": [["9974","9317"] ],"meta": [null,null]},
        {"row": [["9974","11820","3287" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","39584" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","5109" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","3379" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","3288" ]],"meta": [null,null,null]},
        --- Snipp ---
现在我想去掉所有的重复项,得到如下结果

{
  "results": [
    {
      "columns": [
        "uid"
      ],
      "data": [
        {"row": [["9974"]], "meta": [null]},
        {"row": [["14367"]], "meta": [null]},
        {"row": [["11820"]], "meta": [null]},
        {"row": [["11821"]],"meta": [null]},
        {"row": [["11822"]],"meta": [null]},
        {"row": [["11823"]],"meta": [null]},
        {"row": [["9314"]],"meta": [null]},
        {"row": [["9313"]],"meta": [null]},
        {"row": [["9317"]],"meta": [null]},
        {"row": [["14367"]],"meta": [null]},
        {"row": [["11820"]],"meta": [null]},
        {"row": [["11821"]],"meta": [null]},
        {"row": [["11822"]],"meta": [null]},
        {"row": [["11823"]],"meta": [null]},
        --- snipp ---
我将如何做到这一点?
很高兴拥有:有什么方法可以加快速度吗?

有一些调整可以加快速度

首先,您正在对所有路径的节点(路径)进行迭代。其中将有许多重复的节点,因为公共路径将重用相同的节点

由于您使用的是NODE_全局唯一性,因此所有路径的结束节点都应该构成整个子图,因此我们可以将它们作为行获取,然后对用户节点执行筛选(有一种特定的语法用于检查节点是否具有特定的标签),然后获取UID

MATCH (u:User {uid: '12345'})
CALL apoc.path.expandConfig(c, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH DISTINCT LAST(NODES(path)) as user
WHERE user:User
RETURN COLLECT(user.uid) as uid

如果你不想把uid放在一个集合中,只需在最后返回
user.uid

现在查询大图只需两秒钟,真是太棒了!非常感谢,不得不像小女孩一样咯咯笑:)很高兴能帮忙,看到执行时间直线下降总是一件美妙的事情!
MATCH (u:User {uid: '12345'})
CALL apoc.path.expandConfig(c, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH DISTINCT LAST(NODES(path)) as user
WHERE user:User
RETURN COLLECT(user.uid) as uid