Sparql 按组选择最常出现的值_Sparql

Sparql 按组选择最常出现的值

sparql

Sparql 按组选择最常出现的值,sparql,Sparql,我有关于医院病人的RDF数据，包括他们的出生日期。在他们的出生日期附近经常有多个三胞胎，其中一些可能是错误的。我的团队已决定使用此规则：任何最频繁出现的日期都将暂时被视为正确的日期。在我们选择的任何编程语言（SPARQL之外）中，如何做到这一点是很清楚的 SPARQL中是否可以进行聚合聚合我读过类似的问题，但我还没有读到考虑到这些三元组： @prefix turbo: <http://example.org/ontologies/> . @prefix xsd: <htt

我有关于医院病人的RDF数据，包括他们的出生日期。在他们的出生日期附近经常有多个三胞胎，其中一些可能是错误的。我的团队已决定使用此规则：任何最频繁出现的日期都将暂时被视为正确的日期。在我们选择的任何编程语言（SPARQL之外）中，如何做到这一点是很清楚的
SPARQL中是否可以进行聚合聚合
我读过类似的问题，但我还没有读到

考虑到这些三元组：

@prefix turbo: <http://example.org/ontologies/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://example.org/ontologies/b6be95364ec943af2ef4ab161c11c855> a <http://example.org/ontologies/StudyPartWithBBDonation> ; turbo:hasBirthDateO turbo:3950b2b6-f575-4074-b0e8-f9fa3378f3be, turbo:4250aafa-4b0c-4f73-92b6-7639f427b61d, turbo:a3e6676e-a214-4af4-b8ef-34a8e20170bf . turbo:3950b2b6-f575-4074-b0e8-f9fa3378f3be turbo:hasDateValue "1971-12-30"^^xsd:date . turbo:4250aafa-4b0c-4f73-92b6-7639f427b61d turbo:hasDateValue "1971-12-30"^^xsd:date . turbo:a3e6676e-a214-4af4-b8ef-34a8e20170bf turbo:hasDateValue "1971-12-30"^^xsd:date . turbo:6e200ca0d5150282787464a2bda55814 a turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO turbo:b09519f5-b123-40d5-bb4a-737ec9f8b9a8, turbo:06c56881-a6c7-4d1d-993b-add8862dffd7, turbo:12ef184d-c8d6-4d93-a558-a3ba47bb56ca . turbo:b09519f5-b123-40d5-bb4a-737ec9f8b9a8 turbo:hasDateValue "2000-04-04"^^xsd:date . turbo:06c56881-a6c7-4d1d-993b-add8862dffd7 turbo:hasDateValue "2000-04-04"^^xsd:date . turbo:12ef184d-c8d6-4d93-a558-a3ba47bb56ca turbo:hasDateValue "2000-04-05"^^xsd:date .
我只想查看参与研究的每位患者的最高计数日期：

+----------------------------------------+------------------------+------------------+ | part | xsddate | datecount | +----------------------------------------+------------------------+------------------+ | turbo:b6be95364ec943af2ef4ab161c11c855 | "1971-12-30"^^xsd:date | "3"^^xsd:integer | | turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-04"^^xsd:date | "2"^^xsd:integer | +----------------------------------------+------------------------+------------------+
我想我离这里越来越近了。现在我需要得到同一行的计数和最大计数

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX turbo: <http://example.org/ontologies/> SELECT ?part ?xsddate ?datecount ?countmax WHERE { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } UNION { SELECT ?part (MAX(?datecount) AS ?countmax) WHERE { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } GROUP BY ?part } }

本质上，您只需要在查询中用
替换
联合（或者您可以删除此联合，正如@AKSW在下面的评论中指出的那样）但是，在GraphDB中，您将收到一个错误：变量？datecount已在以前的投影中使用。绑定自Sesame2.8以来没有通过投影传播，因此这可能导致查询中出现逻辑错误因此，请通过以下方式更改查询： PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX turbo: <http://example.org/ontologies/> SELECT ?part ?xsddate ?datecount_ ?countmax WHERE { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount_) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } . { SELECT ?part (MAX(?datecount) AS ?countmax) WHERE { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } GROUP BY ?part } } 前缀rdf: 前缀turbo：选择？部件？xsddate？日期计数？计数最大值哪里 {{选择零件xsddate（计数（？xsddate）作为日期计数} 哪里 {？第rdf部分：类型turbo:StudyPartWithBB捐赠；涡轮：出生日期？出生日期。？dob turbo:hasDateValue？xsddate } 按部分分组xsddate } . {选择？部分（最大（？日期计数）作为？计数最大）哪里 {选择部分xsddate（计数（？xsddate）作为日期计数）哪里 {？第rdf部分：类型turbo:StudyPartWithBB捐赠；涡轮：出生日期？出生日期。？dob turbo:hasDateValue？xsddate } 按部分分组xsddate } 分组 } } 在Blazegraph中，您可以使用：前缀rdf: 前缀turbo：选择？部件？xsddate？日期计数？countmax 具有 {选择部分xsddate（计数（？xsddate）作为日期计数）哪里 {？第rdf部分：类型turbo:StudyPartWithBB捐赠；涡轮：出生日期？出生日期。？dob turbo:hasDateValue？xsddate } 按部分分组xsddate }作为%sub 哪里 {{选择部分（MAX（？datecount）作为？countmax）其中{INCLUDE%sub}按部分分组 } 包括%sub } 我对斯坦尼斯拉夫精彩答案的阐述在一个{} 模式中重命名了？datecount 添加了一个过滤器将一致性DOB插入到triplestore中的命名图中前缀rdf: 前缀turbo：插入{ 图turbo:DOB_结论{ ？涡轮零件：Hasbirchdateo？DOBconc。？DOBconc turbo:hasDateValue？xsddate。？DOBconc turbo：结论正确。？DOBconc rdf：类型。 } } 哪里 {{选择部分xsddate（计数（？xsddate）作为日期计数）哪里 {？第rdf部分：类型turbo:StudyPartWithBB捐赠；涡轮：出生日期？出生日期。？dob turbo:hasDateValue？xsddate } 按部分分组xsddate } . {选择零件（最大值（？日期计数2）作为？计数最大值）哪里 {选择部分xsddate（计数（？xsddate）作为日期计数2）哪里 {？第rdf部分：类型turbo:StudyPartWithBB捐赠；涡轮：出生日期？出生日期。？dob turbo:hasDateValue？xsddate } 按部分分组xsddate } 分组 } 过滤器（？datecount=？countmax）绑定（uri）（concat（“http://transformunify.org/ontologies/，struid（））作为？DOBconc） } 谢谢！我创建了一个自己的答案，以展示如何将结论传递回triplestore中的命名图。我们可能会在SPARQL之外完成这项工作，但我真的很高兴我们现在有了这个选项。为什么这里需要点？据我所知，这是多余的，也不是必需的，因为两个GroupGraphPattern 之间的默认操作是连接操作。干杯。@AKSW，你说得对，这个点是多余的。但是的非正式“连接”语义在这里看起来很相关，这就是我犯错误的原因。我更新了我的答案：如果您使用的是Blazegraph，那么可以使用命名子查询。 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX turbo: <http://example.org/ontologies/> SELECT ?part ?xsddate ?datecount ?countmax WHERE { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } UNION { SELECT ?part (MAX(?datecount) AS ?countmax) WHERE { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } GROUP BY ?part } } +----------------------------------------+------------------------+------------------+------------------+ | part | xsddate | datecount | countmax | +----------------------------------------+------------------------+------------------+------------------+ | turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-05"^^xsd:date | "1"^^xsd:integer | | | turbo:b6be95364ec943af2ef4ab161c11c855 | "1971-12-30"^^xsd:date | "3"^^xsd:integer | | | turbo:6e200ca0d5150282787464a2bda55814 | "2000-04-04"^^xsd:date | "2"^^xsd:integer | | | turbo:6e200ca0d5150282787464a2bda55814 | | | "2"^^xsd:integer | | turbo:b6be95364ec943af2ef4ab161c11c855 | | | "3"^^xsd:integer | +----------------------------------------+------------------------+------------------+------------------+ PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX turbo: <http://example.org/ontologies/> SELECT ?part ?xsddate ?datecount_ ?countmax WHERE { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount_) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } . { SELECT ?part (MAX(?datecount) AS ?countmax) WHERE { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } GROUP BY ?part } } PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX turbo: <http://example.org/ontologies/> SELECT ?part ?xsddate ?datecount ?countmax WITH { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } AS %sub WHERE { { SELECT ?part (MAX(?datecount) AS ?countmax) WHERE { INCLUDE %sub } GROUP BY ?part } INCLUDE %sub } PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX turbo: <http://example.org/ontologies/> INSERT { GRAPH turbo:DOB_conclusions { ?part turbo:hasBirthDateO ?DOBconc . ?DOBconc turbo:hasDateValue ?xsddate . ?DOBconc turbo:conclusionated true . ?DOBconc rdf:type <http://www.ebi.ac.uk/efo/EFO_0004950> . } } WHERE { { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } . { SELECT ?part (MAX(?datecount2) AS ?countmax) WHERE { SELECT ?part ?xsddate (COUNT(?xsddate) AS ?datecount2) WHERE { ?part rdf:type turbo:StudyPartWithBBDonation ; turbo:hasBirthDateO ?dob . ?dob turbo:hasDateValue ?xsddate } GROUP BY ?part ?xsddate } GROUP BY ?part } FILTER ( ?datecount = ?countmax ) BIND(uri(concat("http://transformunify.org/ontologies/", struuid())) AS ?DOBconc) }