Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/symfony/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 带Goutte的异步HTML解析器_Php_Symfony_Guzzle_Goutte - Fatal编程技术网

Php 带Goutte的异步HTML解析器

Php 带Goutte的异步HTML解析器,php,symfony,guzzle,goutte,Php,Symfony,Guzzle,Goutte,我正试图在Goutte的帮助下编写HTML解析器。它工作得很好。但是,使用阻塞请求。如果您处理的是单个服务,那么这种方法非常有效。如果我想查询许多相互独立的服务,这会导致一个问题。痛风使用和。我试图更改doRequest函数,但失败了 参数1传递给 Symfony\Component\BrowserKit\CookieJar::updateFromResponse()必须是 Symfony\Component\BrowserKit\Response的实例 如何更改Goutte\Client.ph

我正试图在Goutte的帮助下编写HTML解析器。它工作得很好。但是,使用阻塞请求。如果您处理的是单个服务,那么这种方法非常有效。如果我想查询许多相互独立的服务,这会导致一个问题。痛风使用和。我试图更改doRequest函数,但失败了

参数1传递给 Symfony\Component\BrowserKit\CookieJar::updateFromResponse()必须是 Symfony\Component\BrowserKit\Response的实例


如何更改Goutte\Client.php以使其异步执行请求?如果不可能,我如何运行同时针对不同端点的报废程序?谢谢,古特实际上是Guzzle和Symphony的Browserkit以及DomCrawler之间的桥梁

使用Gout的最大缺点是所有请求都是同步进行的

要非同步完成任务,您必须放弃使用Goutte,直接使用Guzzle和DomCrawler

例如:

$requests = [
    new GuzzleHttp\Psr7\Request('GET', $uri[0]),
    new GuzzleHttp\Psr7\Request('GET', $uri[1]),
    new GuzzleHttp\Psr7\Request('GET', $uri[2]),
    new GuzzleHttp\Psr7\Request('GET', $uri[3]),
    new GuzzleHttp\Psr7\Request('GET', $uri[4]),
    new GuzzleHttp\Psr7\Request('GET', $uri[5]),
    new GuzzleHttp\Psr7\Request('GET', $uri[6]),
];

$client = new GuzzleHttp\Client();

$pool = new GuzzleHttp\Pool($client, $requests, [
    'concurreny' => 5, //how many concurrent requests we want active at any given time
    'fulfilled' => function ($response, $index) {
        $crawler = new Symfony\Component\DomCrawler\Crawler(null, $uri[$index]);
        $crawler->addContent(
            $response->getBody()->__toString(),
            $response->getHeader['Content-Type'][0]
        );        
    },
    'rejected' => function ($response, $index) {
        // do something if the request failed.
    },
]);

$promise = $pool->promise();
$promise->wait();

谢谢你的回答。然而,我正在尝试运行单独的scrapers,它不只是简单的get请求。每个刮刀都有自己的类来执行一系列请求、DOM解析等。这就是为什么我想知道,如何异步调用所有这些刮刀。如果我的报废程序是localhost/site1.php localhost/site2.php,那么使用上述代码从cron.php调用site1.php、site2.php是否是一个好主意?你有什么建议?
$requests = [
    new GuzzleHttp\Psr7\Request('GET', $uri[0]),
    new GuzzleHttp\Psr7\Request('GET', $uri[1]),
    new GuzzleHttp\Psr7\Request('GET', $uri[2]),
    new GuzzleHttp\Psr7\Request('GET', $uri[3]),
    new GuzzleHttp\Psr7\Request('GET', $uri[4]),
    new GuzzleHttp\Psr7\Request('GET', $uri[5]),
    new GuzzleHttp\Psr7\Request('GET', $uri[6]),
];

$client = new GuzzleHttp\Client();

$pool = new GuzzleHttp\Pool($client, $requests, [
    'concurreny' => 5, //how many concurrent requests we want active at any given time
    'fulfilled' => function ($response, $index) {
        $crawler = new Symfony\Component\DomCrawler\Crawler(null, $uri[$index]);
        $crawler->addContent(
            $response->getBody()->__toString(),
            $response->getHeader['Content-Type'][0]
        );        
    },
    'rejected' => function ($response, $index) {
        // do something if the request failed.
    },
]);

$promise = $pool->promise();
$promise->wait();