Graph arangodb中的查询优化
我正在arangodb中运行以下查询:Graph arangodb中的查询优化,graph,full-text-search,query-optimization,arangodb,aql,Graph,Full Text Search,Query Optimization,Arangodb,Aql,我正在arangodb中运行以下查询: LET catalogDatasets = [] LET openDatasets = ( FOR d IN datasets FILTER d.visibility == "open" RETURN d._id ) LET myDatasets = [] LET myPurchasedDatasets = [] LET searchTarget = UNIQUE( UNION( catalogDatasets, openDatasets, myData
LET catalogDatasets = []
LET openDatasets = ( FOR d IN datasets FILTER d.visibility == "open" RETURN d._id )
LET myDatasets = []
LET myPurchasedDatasets = []
LET searchTarget = UNIQUE( UNION( catalogDatasets, openDatasets, myDatasets, myPurchasedDatasets ) )
LET unorderedDatasetsIds = (
FOR dataset IN FULLTEXT(datasets, "word_list", @searchWords)
FILTER dataset._id IN searchTarget RETURN dataset._id
)
LET ordered = (
FOR wl IN wordLinks
FILTER wl._from IN unorderedDatasetsIds
FOR x IN words
FILTER x._id == wl._to
COLLECT did = wl._from INTO score = wl.invFq/(x.numEdges+@epsilon)
SORT score
LIMIT 0, 20
RETURN did
)
RETURN {
dids: ordered,
number_of_items: LENGTH(unorderedDatasetsIds)
}
我的searchwords都使用如下前缀:
pref:banana,|pref:chocollate
基本上,我想优化这个查询,因为返回大约需要2秒钟。我的一个想法是将全文搜索中的项目数限制在1000个,但这样做,数据集将是随机的,因为它将取决于arangodb返回查询的顺序
我可以对该查询应用什么样的优化以使其更快
附言:我有一个空数据集的联合体,但有时它们不是空的。就发生在这个查询的情况下
编辑
我的问题是:
Query string:
LET catalogDatasets = []
LET openDatasets = ( FOR d IN datasets FILTER d.visibility == "open" RETURN d._id )
LET myDatasets = []
LET myPurchasedDatasets = []
LET searchTarget = UNIQUE( UNION( catalogDatasets, openDatasets, myDatasets, myPurchasedDatasets ) )
LET unorderedDatasetsIds = (
FOR dataset IN FULLTEXT(datasets, "word_list", @searchWords)
FILTER dataset._id IN searchTarget RETURN dataset._id
)
LET ordered = (
FOR wl IN wordLinks
FILTER wl._from IN unorderedDatasetsIds
FOR x IN words
FILTER x._id == wl._to
COLLECT did = wl._from INTO score = wl.invFq/(x.numEdges+@epsilon)
SORT score
LIMIT 0, 20
RETURN did
)
RETURN {
dids: ordered,
number_of_items: LENGTH(unorderedDatasetsIds)
}
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
9 SubqueryNode 1 - LET openDatasets = ... /* const subquery */
3 SingletonNode 1 * ROOT
4 EnumerateCollectionNode 9752 - FOR d IN datasets /* full collection scan */
5 CalculationNode 9752 - LET #19 = (d.`visibility` == "open") /* simple expression */ /* collections used: d : datasets */
6 FilterNode 9752 - FILTER #19
7 CalculationNode 9752 - LET #21 = d.`_id` /* attribute expression */ /* collections used: d : datasets */
8 ReturnNode 9752 - RETURN #21
41 CalculationNode 1 - LET #39 = SORTED_UNIQUE(UNIQUE(UNION([ ], openDatasets, [ ], [ ]))) /* simple expression */
20 SubqueryNode 1 - LET unorderedDatasetsIds = ... /* subquery */
13 SingletonNode 1 * ROOT
38 IndexNode 9752 - FOR dataset IN datasets /* fulltext index scan */
16 CalculationNode 9752 - LET #25 = (dataset.`_id` in /* sorted */ #39) /* simple expression */ /* collections used: dataset : datasets */
17 FilterNode 9752 - FILTER #25
18 CalculationNode 9752 - LET #27 = dataset.`_id` /* attribute expression */ /* collections used: dataset : datasets */
19 ReturnNode 9752 - RETURN #27
34 SubqueryNode 1 - LET ordered = ... /* subquery */
21 SingletonNode 1 * ROOT
40 IndexNode 410 - FOR wl IN wordLinks /* edge index scan */
28 CalculationNode 410 - LET #33 = wl.`_from` /* attribute expression */ /* collections used: wl : wordLinks */
39 IndexNode 410 - FOR x IN words /* primary index scan */
37 SortNode 410 - SORT #33 ASC
29 CalculationNode 410 - LET #35 = (wl.`invFq` / (x.`numEdges` + 0.1)) /* simple expression */ /* collections used: wl : wordLinks, x : words */
30 CollectNode 328 - COLLECT did = #33 INTO score = #35 /* sorted */
31 SortNode 328 - SORT score ASC
32 LimitNode 20 - LIMIT 0, 20
33 ReturnNode 20 - RETURN did
35 CalculationNode 1 - LET #37 = { "dids" : ordered, "number_of_items" : LENGTH(unorderedDatasetsIds) } /* simple expression */
36 ReturnNode 1 - RETURN #37
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
38 fulltext datasets false true n/a [ `word_list` ] FULLTEXT(datasets /* all collection documents */, "word_list", "'prefix:トウ,|prefix:とう'")
40 edge wordLinks false false 3.05 % [ `_from`, `_to` ] (wl.`_from` in unorderedDatasetsIds)
39 primary words true false 100.00 % [ `_key` ] (x.`_id` == wl.`_to`)
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 remove-redundant-calculations
4 remove-unnecessary-calculations
5 move-calculations-up-2
6 move-filters-up-2
7 fulltext-index-optimizer
8 use-indexes
9 remove-filter-covered-by-index
10 sort-in-values
11 remove-unnecessary-calculations-2
12 move-calculations-down
嗯。这是一个艰难的提议。这个查询相当昂贵。但我尝试了一些东西:
LET catalogDatasets = []
LET myDatasets = []
LET myPurchasedDatasets = []
LET searchTarget = UNIQUE( UNION( catalogDatasets, myDatasets, myPurchasedDatasets ) )
LET unorderedDatasetsIds = (
FOR dataset IN FULLTEXT(datasets, "word_list", @searchWords)
FILTER dataset._id IN searchTarget || d.visibility == "open" RETURN dataset._id
)
LET ordered = (
FOR wl IN wordLinks
FILTER wl._from IN unorderedDatasetsIds
FOR x IN words
FILTER x._id == wl._to
COLLECT did = wl._from INTO score = wl.invFq/(x.numEdges+@epsilon)
SORT score
LIMIT 0, 20
RETURN did
)
RETURN {
dids: ordered,
number_of_items: LENGTH(unorderedDatasetsIds)
}
这里没有任何明显的东西。但是,如果没有太多的
“open”
的话,显然,对openDatasets进行查询应该是很重要的。请在UI中的查询编辑器中共享Explain
的输出,好吗?完成。我编辑了我的问题。如果在数据集上添加稀疏的非唯一索引,性能会发生怎样的变化。可见性?当您想要返回分页结果时,最好的性能是执行完整查询,首先应用筛选,然后应用排序,然后使用limit根据传递给查询的页面参数为返回的值提供开始/结束值(注意limit可以传递两个参数)。另外,您的searchwords
格式是否为pref:banana,| pref:chocollate
或prefix:banana,| prefix:chocollate
以下内容是否为您提供了任何优势?最后,我们改变了查询方式,但实际上这一方式将查询速度从2.5提高到了1.5。非常感谢。