Php Elasticsearch滚动api搜索“；从「；_Php_Xml_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

Php Elasticsearch滚动api搜索“；从「；

php xml

Php Elasticsearch滚动api搜索“；从「；,php,xml,elasticsearch,Php,Xml,elasticsearch,我有一个基于url索引http://example.com/sitemap.index.xml其中index是一个数字>0，它定义了每个块中应该包含的结果 $chunk = 10000; $counter = 0; $scroll = $es->search(array( "index" => "index", "type" => "type", "scroll" => "1m", "search_type" => "scan",

我有一个基于url索引

http://example.com/sitemap.index.xml

其中

index

是一个数字

>0

，它定义了每个块中应该包含的结果

$chunk = 10000;
$counter = 0;

$scroll = $es->search(array(
    "index" => "index",
    "type" => "type",
    "scroll" => "1m",
    "search_type" => "scan",
    "size" => 10,
    "from" => $chunk * ($index - 1)
));
$sid = $scroll['_scroll_id'];

while($counter < $chunk){
    $docs = $es->scroll(array(
        "scroll_id" => $sid,
        "scroll" => "1m"
    ));
    $sid = $docs['_scroll_id'];
    $counter += count($docs['hits']['hits']);
}

// ...

$chunk=10000；
$counter=0；
$scroll=$es->搜索（数组）(
“索引”=>“索引”，
“类型”=>“类型”，
“滚动”=>“1m”，
“搜索类型”=>“扫描”，
“大小”=>10，
“发件人”=>$chunk*（$index-1）
));
$sid=$scroll[''滚动id']；
while（$counter<$chunk）{
$docs=$es->滚动（数组）(
“scroll_id”=>$sid，
“滚动”=>“1m”
));
$sid=$docs[''u scroll\u id']；
$counter+=计数（$docs['hits']['hits']）；
}
// ...

现在每次我访问

http://example.com/sitemap.1.xml

或

http://example.com/sitemap.2.xml

从ES返回的结果完全相同。它返回

结果（每个碎片10个），但似乎不计算

from=0

，

from=10000

我正在使用

elasticsearch php

作为ES库

有什么想法吗？

在Java中，可以按如下方式完成

QueryBuilder query = QueryBuilders.matchAllQuery();
SearchResponse scrollResp = Constants.client.prepareSearch(index)
        .setTypes(type).setSearchType(SearchType.SCAN)
        .setScroll(new TimeValue(600000)).setQuery(query)
        .setSize(500).execute().actionGet();
while (true) {
    scrollResp = Constants.client
            .prepareSearchScroll(scrollResp.getScrollId())
            .setScroll(new TimeValue(600000)).execute().actionGet();
    System.out.println("Record count :"
            + scrollResp.getHits().getHits().length);
    total = total + scrollResp.getHits().getHits().length;
    System.out.println("Total record count: " + total);
    for (SearchHit hit : scrollResp.getHits()) {
    //handle the hit
    }
    // Break condition: No hits are returned
    if (scrollResp.getHits().getHits().length == 0) {
        System.out.println("All records are fetched");
        break;
    }
}

希望对您有所帮助。

在Java中，可以按如下方式完成

QueryBuilder query = QueryBuilders.matchAllQuery();
SearchResponse scrollResp = Constants.client.prepareSearch(index)
        .setTypes(type).setSearchType(SearchType.SCAN)
        .setScroll(new TimeValue(600000)).setQuery(query)
        .setSize(500).execute().actionGet();
while (true) {
    scrollResp = Constants.client
            .prepareSearchScroll(scrollResp.getScrollId())
            .setScroll(new TimeValue(600000)).execute().actionGet();
    System.out.println("Record count :"
            + scrollResp.getHits().getHits().length);
    total = total + scrollResp.getHits().getHits().length;
    System.out.println("Total record count: " + total);
    for (SearchHit hit : scrollResp.getHits()) {
    //handle the hit
    }
    // Break condition: No hits are returned
    if (scrollResp.getHits().getHits().length == 0) {
        System.out.println("All records are fetched");
        break;
    }
}

希望能有所帮助。

你的意思是，对于每次重新运行的迭代结果都是相同的吗？@Shastry，是的，无论从哪个

from=？

传递到初始

search（）

请求，结果都是完全相同的。我在Java中使用了扫描和滚动。但我没有陷入这种情况。我可以提供java代码吗？@Shastry，当然可以。。。你可以做一个要点，把链接传给我，我来看看。我会提供答案的。看一看，你的意思是，对于每个重新运行的迭代结果都是相同的吗？@Shastry，是的，无论从哪个

from=？

传递到初始

search（）

请求，结果都是完全相同的。我在Java中使用了扫描和滚动。但我没有陷入这种情况。我可以提供java代码吗？@Shastry，当然可以。。。你可以做一个要点，把链接传给我，我来看看。我会提供答案的。看看它谢谢你的回复，但是我没有在你的搜索查询中看到

setFrom（？）

。您的示例很适合返回特定

\u type

的所有记录，但不返回其中的特定块。正如我最初的问题所说，我正在构建一个站点地图，因此我当然有文档大小和url计数的限制，因此我需要将所有站点地图拆分为较小的

sitemap.1.xml

，

sitemap.2.xml

。因此，对

url/sitemap.1.xml

的每个请求都将返回

url/sitemap.2.xml

的第一批

点击率。它将返回

和

之间的点击率。谢谢您的回复，但我在您的搜索查询中没有看到

setFrom（？）

。您的示例很适合返回特定

\u type

sitemap.1.xml

，

sitemap.2.xml

。因此，对

url/sitemap.1.xml

的每个请求都将返回第一个

点击量，对于

url/sitemap.2.xml

它将返回

和

之间的点击量。