Optimization 优化或重组n-n关系Neo4j查询,导致响应速度非常慢

Optimization 优化或重组n-n关系Neo4j查询,导致响应速度非常慢,optimization,neo4j,cypher,database-performance,graph-databases,Optimization,Neo4j,Cypher,Database Performance,Graph Databases,我正在寻找我的一个Neo4j图表的帮助。我的节点和关系如下所示 这是图中一个端到端关系的示例 该问题与具有多个(通知)的节点(存储)相关,并且它们与多个(通知操作类型)和(通知类型)相关 查询如下所示: MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:

我正在寻找我的一个Neo4j图表的帮助。我的节点和关系如下所示

这是图中一个端到端关系的示例

该问题与具有多个(通知)的节点(存储)相关,并且它们与多个(通知操作类型)和(通知类型)相关

查询如下所示:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), shortestPath((nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)), shortestPath((nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute) WHERE s.uuid={app_id} AND sta.key='name' OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand) OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty) RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
一个查询的响应时间超过8秒。我的应用程序经常需要这些信息。这导致整体响应差和崩溃,这就是为什么在此期间我引入了redis来缓存应用程序所需的一些数据

e、 g.来自原始查询的一个条目的响应是

json如下所示,其中列名是图中的节点

 [
  {
    "nt_id": "002a3ba0-2584-11ea-93de-118eb121a0f8",
    "st_ids": [
      "e5fb2cc0-2246-11ea-a327-c1a6ac2ca4a0"
    ],
    "store_names": [
      "AND"
    ],
    "notification_brands": [],
    "notification_labels": [],
    "notification": [
      {
        "sub_text": "Happy shopping!!",
        "action_url": "https://www.tatacliq.com/and/c-mbh11a00015",
        "image_url": "",
        "notification_match_type": "GENERAL",
        "validity_start": 1,
        "text": "Welcome to {{store_name}}",
        "inventory_request_params": "",
        "isActive": true,
        "validity_end": 1,
        "uuid": "002a3ba0-2584-11ea-93de-118eb121a0f8",
        "active_days": "{\"SUNDAY\":\"1100-2100\",\"MONDAY\":\"1100-2100\",\"TUESDAY\":\"1100-2100\",\"WEDNESDAY\":\"1100-2100\",\"THURSDAY\":\"1100-2100\",\"FRIDAY\":\"1100-2100\",\"SATURDAY\":\"1100-2100\"}"
      }
    ],
    "notification_type": [
      {
        "name": "Deals & Offers",
        "uuid": "2fdc2b20-4faf-11e9-bfff-47192e190163"
      }
    ],
    "notification_action_type": [
      {
        "name": "In Store",
        "uuid": "ce78fc50-4fae-11e9-b974-7995b4e2b93d"
      }
    ]
  }
]
  • neo4j版本:neo4j:3.5.12-enterprise,在AWS m5.xlarge机器上的Docker中运行,配置了12G堆和缓存大小

  • 您使用哪种API/驱动程序:ECS上的Rest API和单独实例上的Node JS

  • [简介或解释]的屏幕截图

另外,附加query.log,解释实时环境上的执行时间表

非常感谢在这方面的任何帮助

谢谢, 阿纳布

嘿,阿纳布

我对你在比赛中用2个最短路径的电话试图达到的目标有点困惑。 您可能希望使用shortestPath函数返回先前匹配的两个节点之间的最短路径。然后它会像这样,例如:

MATCH (a), (b), p = shortestPath((a)-[*]-(b))
RETURN p
必须使用可变路径长度(*)让函数搜索从a到b的所有可能路径

现在,如果我们从查询中调用最短路径函数,并根据您编写查询脚本的方式,您应该会得到相同的结果,但希望更快:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
您甚至可以尝试使用后续匹配,以使查询比单个匹配更快:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification)
WHERE s.uuid={app_id}
MATCH (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)
MATCH (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)
MATCH (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
但请注意,根据您的图形模型,这两个查询可能会产生不同的结果。如果您有兴趣了解这两种不同的图形匹配方法之间的差异,您可以在下面的帖子中阅读Adam的答案:

我希望这有帮助! 比尔, 法比恩

嘿,阿纳布

我对你在比赛中用2个最短路径的电话试图达到的目标有点困惑。 您可能希望使用shortestPath函数返回先前匹配的两个节点之间的最短路径。然后它会像这样,例如:

MATCH (a), (b), p = shortestPath((a)-[*]-(b))
RETURN p
必须使用可变路径长度(*)让函数搜索从a到b的所有可能路径

现在,如果我们从查询中调用最短路径函数,并根据您编写查询脚本的方式,您应该会得到相同的结果,但希望更快:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
您甚至可以尝试使用后续匹配,以使查询比单个匹配更快:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification)
WHERE s.uuid={app_id}
MATCH (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)
MATCH (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)
MATCH (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
但请注意,根据您的图形模型,这两个查询可能会产生不同的结果。如果您有兴趣了解这两种不同的图形匹配方法之间的差异,您可以在下面的帖子中阅读Adam的答案:

我希望这有帮助! 比尔,
法比恩

谢谢你的回答。我现在已经解决了这个问题。我之所以引入最短路径,是因为我有你早些时候提出的问题。那是

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
如果在我的图表中,每个通知节点都有一个传入关系,即
[r:HAS\u NOTIFICATION\u TYPE]
&
[ra:HAS\u NOTIFICATION\u ACTION\u TYPE]
,并且每个商店节点都有多个通知。当图表增长时,即当我有1500多个存储和~200个通知时,每个都有上述传入关系。查询将无法工作,服务器将挂起并崩溃。将这些更改为最短路径确实阻止了这种情况的发生,但结果时间仍然很长

现在,我正在通知节点的属性中保存Notif_类型&Notif_Action_类型节点的uuid。并按如下方式更新查询:

MATCH (n:business_entity)-[:HAS_APP]->(s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), 
        (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
        WHERE n.uuid={b_id} AND s.uuid={app_id} AND sta.key='name'
        OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
        OPTIONAL MATCH(ntt:notification_type) WHERE ntt.uuid = nt.notification_type
        OPTIONAL MATCH(ntta:notification_action_type) WHERE ntta.uuid = nt.notification_action_type
        RETURN nt.uuid as nt_id,  COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names,
        COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(nt)) as notification, 
        COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type
现在,响应时间在500毫秒以下。谢谢你的详细解释!
干杯

谢谢你的回答。我现在已经解决了这个问题。我之所以引入最短路径,是因为我有你早些时候提出的问题。那是

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id
如果在我的图表中,每个通知节点都有一个传入关系,即
[r:HAS\u NOTIFICATION\u TYPE]
&
[ra:HAS\u NOTIFICATION\u ACTION\u TYPE]
,并且每个商店节点都有多个通知。当图表增长时,即当我有1500多个存储和~200个通知时,每个都有上述传入关系。查询将无法工作,服务器将挂起并崩溃。将这些更改为最短路径确实阻止了这种情况的发生,但结果时间仍然很长

现在,我正在通知节点的属性中保存Notif_类型&Notif_Action_类型节点的uuid。并按如下方式更新查询:

MATCH (n:business_entity)-[:HAS_APP]->(s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), 
        (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
        WHERE n.uuid={b_id} AND s.uuid={app_id} AND sta.key='name'
        OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
        OPTIONAL MATCH(ntt:notification_type) WHERE ntt.uuid = nt.notification_type
        OPTIONAL MATCH(ntta:notification_action_type) WHERE ntta.uuid = nt.notification_action_type
        RETURN nt.uuid as nt_id,  COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names,
        COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(nt)) as notification, 
        COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type
现在,响应时间在500毫秒以下。谢谢你的详细解释!
干杯

我不知道你为什么用最短路径。您是否对“具有通知类型”和“具有通知类型”的关系有权重?根据该图,在通知(nt)和通知类型(ntt)或通知操作类型(ntta)之间只有一个跃点。你在寻找最近的通知吗?我不知道你为什么使用最短路径。您是否对“具有通知类型”和“具有通知类型”的关系有权重?根据该图,在通知(nt)和通知类型(ntt)或通知操作类型(ntta)之间只有一个跃点。您正在查找最近的通知吗?