Xquery中的分组和计数

Xquery中的分组和计数,xquery,marklogic,Xquery,Marklogic,这是XML。我正在尝试获取由作者发布的标题数量,日期范围为2012年2月15日至2012年2月24日,顺序为从高到低(标题数量) 1. 23/02/2012 标题1 这是第一篇 潘卡吉 2. 22/02/2012 标题2 这是标题二 潘卡吉 3. 21/02/2012 标题3 这是第三篇 抢劫 4. 20/02/2012 标题4 这是第四篇 上下快速移动 5. 19/02/2012 标题1 这是第五篇 潘卡吉 我正在尝试从xquery获取输出: <?xml version="1.0"

这是XML。我正在尝试获取由作者发布的标题数量,日期范围为2012年2月15日至2012年2月24日,顺序为从高到低(标题数量)


1.
23/02/2012
标题1
这是第一篇
潘卡吉
2.
22/02/2012
标题2
这是标题二
潘卡吉
3.
21/02/2012
标题3
这是第三篇
抢劫
4.
20/02/2012
标题4
这是第四篇
上下快速移动
5.
19/02/2012
标题1
这是第五篇
潘卡吉

我正在尝试从xquery获取输出:

<?xml version="1.0" encoding="UTF-8"?>
<results>
<result>
    <author>
        <name>Pankaj</name>
    </author>
    <numberOfTitles>3</numberOfTitles>
</result>
<result>
    <author>
        <name>Rob</name>
    </author>
    <numberOfTitles>1</numberOfTitles>
</result>
<result>
    <author>
        <name>Bob</name>
    </author>
    <numberOfTitles>1</numberOfTitles>
</result>

潘卡吉
3.
抢劫
1.
上下快速移动
1.


请帮助我..

以下内容应适用于大多数处理器。在MarkLogic中可能会有更高效的查询,但这将帮助您开始

let $doc := <entries>
<entry>
    <id>1</id>
    <published>23/02/2012</published>
    <title>Title 1</title>
    <content type="html">This is title one</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
<entry>
    <id>2</id>
    <published>22/02/2012</published>
    <title>Title 2</title>
    <content type="html">This is title two</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
<entry>
    <id>3</id>
    <published>21/02/2012</published>
    <title>Title 3</title>
    <content type="html">This is title three</content>
    <author>
        <name>Rob</name>
    </author>
</entry>
<entry>
    <id>4</id>
    <published>20/02/2012</published>
    <title>Title 4</title>
    <content type="html">This is title four</content>
    <author>
        <name>Bob</name>
    </author>
</entry>
<entry>
    <id>5</id>
    <published>19/02/2012</published>
    <title>Title 1</title>
    <content type="html">This is title five</content>
    <author>
        <name>Pankaj</name>
    </author>
</entry>
</entries>

return
 <results>
    {
        for $author in distinct-values($doc/entry/author/name/string())
        return
        <result><author>
            <name>{$author}</name>
            <numberOfTitles>{count($doc/entry[author/name/string() eq $author])} </numberOfTitles>
        </author></result>
    }
 </results>
let$doc:=
1.
23/02/2012
标题1
这是第一篇
潘卡吉
2.
22/02/2012
标题2
这是标题二
潘卡吉
3.
21/02/2012
标题3
这是第三篇
抢劫
4.
20/02/2012
标题4
这是第四篇
上下快速移动
5.
19/02/2012
标题1
这是第五篇
潘卡吉
返回
{
对于不同值的$author($doc/entry/author/name/string())
返回
{$author}
{count($doc/entry[author/name/string()eq$author])}
}

以下是我的解决方案:

<results>{
  for $entry in //entry
  let $date := xs:date(string-join(reverse(tokenize($entry/published, '/')), '-')),
      $author := $entry/author/string()
  where xs:date('2012-02-15') le $date and $date le xs:date('2012-02-24')
  group by $author
  order by count($entry) descending
  return <result>{
    <author>
      <name>{$author}</name>
    </author>,
    <numberOfTitles>{count($entry)}</numberOfTitles>
  }</result>
}</results>
{
对于//entry中的$entry
让$date:=xs:date(字符串连接(反向(标记化($entry/published,“/”),“-”),
$author:=$entry/author/string()
其中xs:date('2012-02-15')le$date和$date le xs:date('2012-02-24'))
按$author分组
按计数($entry)降序排序
返回{
,它会产生正确的结果


它使用,否则会更复杂。我不知道MarkLogic是否支持这一点。

此XQuery 1.0解决方案可由任何兼容的XQuery 1.0处理器执行:

<entries>
    <entry>
        <id>1</id>
        <published>23/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title one</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>2</id>
        <published>22/02/2012</published>
        <title>Title 2</title>
        <content type="html">This is title two</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>3</id>
        <published>21/02/2012</published>
        <title>Title 3</title>
        <content type="html">This is title three</content>
        <author>
            <name>Rob</name>
        </author>
    </entry>
    <entry>
        <id>4</id>
        <published>20/02/2012</published>
        <title>Title 4</title>
        <content type="html">This is title four</content>
        <author>
            <name>Bob</name>
        </author>
    </entry>
    <entry>
        <id>5</id>
        <published>19/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title five</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
</entries>
<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <author>
         <name>Pankaj</name>
      </author>
      <numberOfTitles>3</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Rob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Bob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
</results>
注意:不使用
分组依据
和不使用
不同值()

<results> 
 {
 let $entries := 
    /*/entry
           [for $d in 
                    xs:date(string-join(reverse(tokenize(published, '/')), '-'))
                return
                   xs:date('2012-02-15') le $d and $d le xs:date('2012-02-24')
             ],

  $vals := $entries/author/name
      return
         for $a in  $vals[index-of($vals, .)[1]],
                $cnt in count(index-of($vals, $a)) 
           order by $cnt descending
             return
              <result>
                <author>
                  {$a}
                 </author>
                 <numberOfTitles>
                   {count(index-of($vals, $a))}
                 </numberOfTitles>
              </result>
    }
</results>

{
让$entries:=
/*/入口
[每小时d美元]
xs:date(字符串联接(反向(标记化(已发布的“/”),“-”))
返回
xs:date('2012-02-15')le$d和$d le xs:date('2012-02-24'))
],
$VAL:=$entries/作者/姓名
返回
对于$VAL中的$a[指数($VAL,)[1]],
$cnt计数(指数($VAL,$a))
按$cnt降序订购
返回
{$a}
{计数(索引($VAL,$a))}
}
应用于提供的XML文档时

<entries>
    <entry>
        <id>1</id>
        <published>23/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title one</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>2</id>
        <published>22/02/2012</published>
        <title>Title 2</title>
        <content type="html">This is title two</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>3</id>
        <published>21/02/2012</published>
        <title>Title 3</title>
        <content type="html">This is title three</content>
        <author>
            <name>Rob</name>
        </author>
    </entry>
    <entry>
        <id>4</id>
        <published>20/02/2012</published>
        <title>Title 4</title>
        <content type="html">This is title four</content>
        <author>
            <name>Bob</name>
        </author>
    </entry>
    <entry>
        <id>5</id>
        <published>19/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title five</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
</entries>
<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <author>
         <name>Pankaj</name>
      </author>
      <numberOfTitles>3</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Rob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Bob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
</results>

1.
23/02/2012
标题1
这是第一篇
潘卡吉
2.
22/02/2012
标题2
这是标题二
潘卡吉
3.
21/02/2012
标题3
这是第三篇
抢劫
4.
20/02/2012
标题4
这是第四篇
上下快速移动
5.
19/02/2012
标题1
这是第五篇
潘卡吉
生成所需的正确结果

<entries>
    <entry>
        <id>1</id>
        <published>23/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title one</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>2</id>
        <published>22/02/2012</published>
        <title>Title 2</title>
        <content type="html">This is title two</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
    <entry>
        <id>3</id>
        <published>21/02/2012</published>
        <title>Title 3</title>
        <content type="html">This is title three</content>
        <author>
            <name>Rob</name>
        </author>
    </entry>
    <entry>
        <id>4</id>
        <published>20/02/2012</published>
        <title>Title 4</title>
        <content type="html">This is title four</content>
        <author>
            <name>Bob</name>
        </author>
    </entry>
    <entry>
        <id>5</id>
        <published>19/02/2012</published>
        <title>Title 1</title>
        <content type="html">This is title five</content>
        <author>
            <name>Pankaj</name>
        </author>
    </entry>
</entries>
<?xml version="1.0" encoding="UTF-8"?>
<results>
   <result>
      <author>
         <name>Pankaj</name>
      </author>
      <numberOfTitles>3</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Rob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
   <result>
      <author>
         <name>Bob</name>
      </author>
      <numberOfTitles>1</numberOfTitles>
   </result>
</results>

潘卡吉
3.
抢劫
1.
上下快速移动
1.

这里有另一个类似于Leo Wörteler的解决方案:

declare function local:FormatDate($origDate as xs:string) as xs:date 
  {
      xs:date(string-join(reverse(tokenize($origDate, '/')), '-'))
  };

<results>
  {
  for $author in distinct-values(/entries/entry/author/name)
  let $startDate := xs:date('2012-02-15')
  let $endDate := xs:date('2012-02-24')
  order by count(/entries/entry[author/name=$author][$startDate <= local:FormatDate(published) and local:FormatDate(published) <= $endDate]) descending
  return
    <result>
      <author>
        <name>{$author}</name>
      </author>
      <numberOfTitles>{count(/entries/entry[author/name=$author][$startDate <= local:FormatDate(published) and local:FormatDate(published) <= $endDate])}</numberOfTitles>
    </result>
  }
</results>
将函数local:FormatDate($origDate作为xs:string)声明为xs:date
{
xs:date(字符串连接(反向(标记化($origDate,“/”),“-”))
};
{
对于$author,使用不同的值(/entries/entry/author/name)
let$startDate:=xs:date('2012-02-15')
let$endDate:=xs:date('2012-02-24')

按计数排序(/entries/entry[author/name=$author][$startDate这是一个特定于MarkLogic的解决方案,它使用映射高效地实现分组。输入XML已声明为
$input
,但您可以用调用
doc()
或任何其他访问器来替换它

去年,我还在一篇博客文章中探讨了这个话题:


+1在基于映射的解决方案上。其他解决方案具有
计数(/entry/author[$name=xx])
子句或嵌套在FLWOR中的其他
XPath
,这实际上是一个嵌套循环。嵌套循环导致O(N^2)性能,在测试中可以很好,然后在数据大小增加时降低速度。

这可能取决于您使用的XQuery版本。什么样的XQuery处理器/数据库应该运行该查询?Im使用氧气(Saxon PE Xquery9.2.0.6)最后,我必须通过Marklogic上的XCC api运行此查询。您可以在诸如$doc/entry[author/name/string()eq$author和XXXX]等条目的谓词中添加日期约束;将XXX替换为解析日期格式并进行必要比较的逻辑。这不会过滤日期,也不会排序,是吗?不,我很懒,但我会做一些类似于您的答案的事情。在谓词中添加另一位以过滤日期范围,然后添加按计数排序($doc/entry)[author/name/string()eq$author])进行排序。+1您可以在