Php 带Goutte的异步HTML解析器_Php_Symfony_Guzzle_Goutte

Php 带Goutte的异步HTML解析器

php symfony

Php 带Goutte的异步HTML解析器,php,symfony,guzzle,goutte,Php,Symfony,Guzzle,Goutte,我正试图在Goutte的帮助下编写HTML解析器。它工作得很好。但是，使用阻塞请求。如果您处理的是单个服务，那么这种方法非常有效。如果我想查询许多相互独立的服务，这会导致一个问题。痛风使用和。我试图更改doRequest函数，但失败了参数1传递给 Symfony\Component\BrowserKit\CookieJar:：updateFromResponse（）必须是 Symfony\Component\BrowserKit\Response的实例如何更改Goutte\Client.ph

我正试图在Goutte的帮助下编写HTML解析器。它工作得很好。但是，使用阻塞请求。如果您处理的是单个服务，那么这种方法非常有效。如果我想查询许多相互独立的服务，这会导致一个问题。痛风使用和。我试图更改doRequest函数，但失败了

参数1传递给 Symfony\Component\BrowserKit\CookieJar:：updateFromResponse（）必须是 Symfony\Component\BrowserKit\Response的实例

如何更改Goutte\Client.php以使其异步执行请求？如果不可能，我如何运行同时针对不同端点的报废程序？谢谢，古特实际上是Guzzle和Symphony的Browserkit以及DomCrawler之间的桥梁

使用Gout的最大缺点是所有请求都是同步进行的

要非同步完成任务，您必须放弃使用Goutte，直接使用Guzzle和DomCrawler

例如：

$requests = [
    new GuzzleHttp\Psr7\Request('GET', $uri[0]),
    new GuzzleHttp\Psr7\Request('GET', $uri[1]),
    new GuzzleHttp\Psr7\Request('GET', $uri[2]),
    new GuzzleHttp\Psr7\Request('GET', $uri[3]),
    new GuzzleHttp\Psr7\Request('GET', $uri[4]),
    new GuzzleHttp\Psr7\Request('GET', $uri[5]),
    new GuzzleHttp\Psr7\Request('GET', $uri[6]),
];

$client = new GuzzleHttp\Client();

$pool = new GuzzleHttp\Pool($client, $requests, [
    'concurreny' => 5, //how many concurrent requests we want active at any given time
    'fulfilled' => function ($response, $index) {
        $crawler = new Symfony\Component\DomCrawler\Crawler(null, $uri[$index]);
        $crawler->addContent(
            $response->getBody()->__toString(),
            $response->getHeader['Content-Type'][0]
        );        
    },
    'rejected' => function ($response, $index) {
        // do something if the request failed.
    },
]);

$promise = $pool->promise();
$promise->wait();

谢谢你的回答。然而，我正在尝试运行单独的scrapers，它不只是简单的get请求。每个刮刀都有自己的类来执行一系列请求、DOM解析等。这就是为什么我想知道，如何异步调用所有这些刮刀。如果我的报废程序是localhost/site1.php localhost/site2.php，那么使用上述代码从cron.php调用site1.php、site2.php是否是一个好主意？你有什么建议？

$requests = [
    new GuzzleHttp\Psr7\Request('GET', $uri[0]),
    new GuzzleHttp\Psr7\Request('GET', $uri[1]),
    new GuzzleHttp\Psr7\Request('GET', $uri[2]),
    new GuzzleHttp\Psr7\Request('GET', $uri[3]),
    new GuzzleHttp\Psr7\Request('GET', $uri[4]),
    new GuzzleHttp\Psr7\Request('GET', $uri[5]),
    new GuzzleHttp\Psr7\Request('GET', $uri[6]),
];

$client = new GuzzleHttp\Client();

$pool = new GuzzleHttp\Pool($client, $requests, [
    'concurreny' => 5, //how many concurrent requests we want active at any given time
    'fulfilled' => function ($response, $index) {
        $crawler = new Symfony\Component\DomCrawler\Crawler(null, $uri[$index]);
        $crawler->addContent(
            $response->getBody()->__toString(),
            $response->getHeader['Content-Type'][0]
        );        
    },
    'rejected' => function ($response, $index) {
        // do something if the request failed.
    },
]);

$promise = $pool->promise();
$promise->wait();