在GraphDB上加速SPARQL查询_Sparql_Rdf_Semantic Web_Graphdb

在GraphDB上加速SPARQL查询

sparql rdf

在GraphDB上加速SPARQL查询,sparql,rdf,semantic-web,graphdb,Sparql,Rdf,Semantic Web,Graphdb,我正试图加速并优化这个查询 select distinct ?root where { ?root a :Root ; :hasnode* ?node ; :hasnode* ?node2 . ?node a :Node ; :hasAnnotation ?ann . ?ann :hasReference ?ref . ?ref a :ReferenceType1 . ?node2 a :

我正试图加速并优化这个查询

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node ;
          :hasnode* ?node2 .

    ?node a :Node ;
           :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .

    ?node2 a :Node ;
            :hasAnnotation ?ann2 .
    ?ann2 :hasReference ?ref2 .
    ?ref2 a :ReferenceType2 .

}

基本上，我在分析一些树，我想得到所有树（即树的根），它们至少有两个底层节点，模式如下：

?node_x a :Node ;
       :hasAnnotation ?ann_x .
?ann_x :hasReference ?ref_x .
?ref_x a :ReferenceTypex .

一个带有

x=1

，另一个带有

x=2

因为在我的图中，一个节点最多可以有一个

：hasaannotation

谓词，所以我不必指定这些节点必须不同

问题

前面提到的查询描述了我需要的内容，但性能非常差。在执行数分钟后，它仍在运行

我的（丑陋）解决方案：将其一分为二

我注意到，如果一次查找一个节点模式，我会在几秒钟内得到结果（！）

遗憾的是，我目前的方法是运行以下查询类型两次：

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node .

    ?node a :Node ;
           :hasAnnotation ?ann_x .
    ?ann_x :hasReference ?ref_x .
    ?ref_x a :ReferenceTypex .
}

一个带有

x=1

，另一个带有

x=2

将部分结果（即

？root

s）保存在两个集合中，例如

R1

和

R2

，最后计算这些结果集合之间的交集

有没有一种方法可以通过利用SPARQL来加速我的初始方法以获得结果

PS：我正在使用GraphDB。

好吧，把自动提示：）和Stanislav的建议结合起来，我想出了一个解决方案

解决方案1嵌套查询

通过以下方式嵌套查询，我在

15s

中得到结果

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node .
    ?node a :Node ;
          :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    {
        select distinct ?root where { 
            ?root a :Root ;
                  :hasnode* ?node2 .
            ?node2 a :Node ;
                   :hasAnnotation ?ann2 .
            ?ann2 :hasReference ?ref2 .
            ?ref2 a :ReferenceType2 .
        }
    }
}

解决方案2：分组到
{}

按照Stanislav的建议，将零件分组为

{}

，需要

60秒
select distinct ?root where { 
    {
    ?root a :Root ;
          :hasnode* ?node .

    ?node a :Node ;
           :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    }
    {
        ?root a :Root ;
          :hasnode* ?node2 .

              ?node2 a :Node ;
            :hasAnnotation ?ann2 .
    ?ann2 :hasReference ?ref2 .
    ?ref2 a :ReferenceType2 .
    }
}

在第一种情况下，GraphDB的优化器可能会为我的数据构建更有效的查询计划（欢迎解释）
我曾经以“声明式”的方式考虑过SPARQL，但似乎在编写SPARQL的方式方面，性能有很大的差异。来自SQL，在我看来，这种性能可变性比关系世界中发生的情况要大得多
然而，在阅读本文时，我似乎对SPARQL优化器动力学还不够了解
 好吧，把自动提示：）和斯坦尼斯拉夫的建议放在一起，我想出了一个解决办法
解决方案1嵌套查询
通过以下方式嵌套查询，我在15s
中得到结果
select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node .
    ?node a :Node ;
          :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    {
        select distinct ?root where { 
            ?root a :Root ;
                  :hasnode* ?node2 .
            ?node2 a :Node ;
                   :hasAnnotation ?ann2 .
            ?ann2 :hasReference ?ref2 .
            ?ref2 a :ReferenceType2 .
        }
    }
}

解决方案2：分组到{}
按照Stanislav的建议，将零件分组为{}
，需要60秒
select distinct ?root where { 
    {
    ?root a :Root ;
          :hasnode* ?node .

    ?node a :Node ;
           :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    }
    {
        ?root a :Root ;
          :hasnode* ?node2 .

              ?node2 a :Node ;
            :hasAnnotation ?ann2 .
    ?ann2 :hasReference ?ref2 .
    ?ref2 a :ReferenceType2 .
    }
}

在第一种情况下，GraphDB的优化器可能会为我的数据构建更有效的查询计划（欢迎解释）
我曾经以“声明式”的方式考虑过SPARQL，但似乎在编写SPARQL的方式方面，性能有很大的差异。来自SQL，在我看来，这种性能可变性比关系世界中发生的情况要大得多
然而，在阅读本文时，我似乎对SPARQL优化器动力学还不够了解
 在不知道具体数据集的情况下，我只能为您提供一些优化查询的一般指导：
避免对大型数据集使用DISTINCT
GraphDB查询优化程序不会自动重写查询，以便对未参与投影的所有模式使用EXISTS。查询语义是发现至少有一个这样的模式，但不给我所有的绑定，然后消除重复的结果
具体化属性路径
GraphDB有一个非常高效的前向链接推理器和相对不太优化的属性路径扩展。如果您不关心写入/数据更新性能，我建议您将：hasNode
声明为可传递属性（请参阅），这将消除属性路径通配符。这将使查询速度提高很多倍
您的最终查询应该如下所示：
select ?root where { 
    ?root a :Root ;
          :hasnode ?node ;
          :hasnode ?node2 .

    FILTER (?node != ?node2)

    FILTER EXISTS {
        ?node a :Node ;
               :hasAnnotation ?ann .
        ?ann :hasReference ?ref .
        ?ref a :ReferenceType1 .
    }

    FILTER EXISTS {
        ?node2 a :Node ;
                :hasAnnotation ?ann2 .
        ?ann2 :hasReference ?ref2 .
        ?ref2 a :ReferenceType2 .
    }
}

在不了解特定数据集的情况下，我只能为您提供一些优化查询的一般指导：
避免对大型数据集使用DISTINCT
GraphDB查询优化程序不会自动重写查询，以便对未参与投影的所有模式使用EXISTS。查询语义是发现至少有一个这样的模式，但不给我所有的绑定，然后消除重复的结果
具体化属性路径
GraphDB有一个非常高效的前向链接推理器和相对不太优化的属性路径扩展。如果您不关心写入/数据更新性能，我建议您将：hasNode
声明为可传递属性（请参阅），这将消除属性路径通配符。这将使查询速度提高很多倍
您的最终查询应该如下所示：
select ?root where { 
    ?root a :Root ;
          :hasnode ?node ;
          :hasnode ?node2 .

    FILTER (?node != ?node2)

    FILTER EXISTS {
        ?node a :Node ;
               :hasAnnotation ?ann .
        ?ann :hasReference ?ref .
        ?ref a :ReferenceType1 .
    }

    FILTER EXISTS {
        ?node2 a :Node ;
                :hasAnnotation ?ann2 .
        ?ann2 :hasReference ?ref2 .
        ?ref2 a :ReferenceType2 .
    }
}

自动提示：嵌套查询可以尝试将这两个组包含在{}
中。PS.FactForge是否包含一些与您类似的数据？关于PS：不幸的是，没有，或者至少我没有意识到任何相似性。无论如何，它们是一种带注释的系谱树。自动提示：嵌套查询可能有助于将这两个组包含在{}
中。PS.FactForge是否包含一些与您类似的数据？关于PS：不幸的是，没有，或者至少我没有意识到任何相似性。无论如何，它们是一种带注释的系谱树。仅供参考：仅供参考：：hasnode a owl:TransitiveProperty
在每个节点级别展平我的树结构，并避免属性路径。它也适用于猫头鹰地垒优化。伟大的但是，如果我希望每个匹配树的根都有一个条目，但我必须避免使用DISTINCT
？另一个问题：为什么要使用过滤器EXISTS
，而不是将这些三元组作为唯一的主模式编写？注意：如果没有过滤器EXISTS
，查询似乎很快