获取动态内容的PHP cURL_Php_Curl

获取动态内容的PHP cURL

php curl

获取动态内容的PHP cURL,php,curl,Php,Curl,我正在尝试使用cURL和PHP从网页上删除代理。然而，当我使用cURL时，我得到的只是$content中的CSS。该页面使用wordpress动态加载内容，但我没有找到任何东西来帮助我下载动态内容。我在linux中使用wget，页面下载很好 <?php //$source1 = file_get_contents('http://www.new-fresh-proxies.blogspot.com/'); $source1 = get_data("http://www.new-fr

我正在尝试使用cURL和PHP从网页上删除代理。然而，当我使用cURL时，我得到的只是$content中的CSS。该页面使用wordpress动态加载内容，但我没有找到任何东西来帮助我下载动态内容。我在linux中使用wget，页面下载很好

    <?php
//$source1 = file_get_contents('http://www.new-fresh-proxies.blogspot.com/');
$source1 = get_data("http://www.new-fresh-proxies.blogspot.com/");

$array = array();
$source1 = preg_grep('/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\:\d{1,5}\b/', $array);





//download webpage
function get_data($url) {
    $options = array(
            CURLOPT_RETURNTRANSFER => 1,     // return web page
            CURLOPT_HEADER         => true,    // don't return headers
            CURLOPT_FOLLOWLOCATION => true,     // follow redirects
            CURLOPT_ENCODING       => "",       // handle all encodings
            CURLOPT_USERAGENT      => "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13", // who am i
            CURLOPT_AUTOREFERER    => true,     // set referer on redirect
            CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
            CURLOPT_TIMEOUT        => 120,      // timeout on response
            CURLOPT_MAXREDIRS      => 50,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;

Curl无法直接获取它，因为它不会执行javascript。但是，如果它来自ajax请求，您可以直接向该端点发出请求

使用dev工具/firebug查看正在发生的事情。

Curl无法直接获取它，因为它无法执行javascript。但是，如果它来自ajax请求，您可以直接向该端点发出请求

使用开发工具/firebug查看正在发生的事情。

两件事：

你的“产出”来自哪里？我在你的代码中没有看到显示
我还认为你的
```
preg\u grep
```
声明是不正确的。您正在搜索一个空白数组，并将结果保存到刚刚将数据拉入的变量中。尝试：

$array=preg\grep（'/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\：\d{1,5}\b/'，$source1）
当我运行代码并在get_data
调用之后直接转储$source1['content']
时，我会得到大量的IP地址
 两件事：

你的“产出”来自哪里？我在你的代码中没有看到显示
我还认为你的preg\u grep
声明是不正确的。您正在搜索一个空白数组，并将结果保存到刚刚将数据拉入的变量中。尝试：

$array=preg\grep（'/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\：\d{1,5}\b/'，$source1）
当我运行代码并在get_data
调用之后直接转储$source1['content']
时，我会得到大量的IP地址
 在我看来，这要么是超时，要么是Regexp有问题
为什么不像你一开始尝试的那样坚持使用file\u get\u contents

$content = file_get_contents('http://www.new-fresh-proxies.blogspot.com.au');

preg_match_all('/(\d+\.\d+\.\d+\.\d+(:\d+)?)/', $content, $matches);

print_r($matches[1]);

这将打印出IP的列表：
Array
(
    [0] => 1.204.168.15:6673
    [1] => 1.234.45.130:80
    [2] => 1.34.163.101:8080
    [3] => 1.34.29.89:8080
    [4] => 1.34.8.221:3128
    ....

希望这能有所帮助。
在我看来，这要么是超时，要么是Regexp有问题
为什么不像你一开始尝试的那样坚持使用file\u get\u contents

$content = file_get_contents('http://www.new-fresh-proxies.blogspot.com.au');

preg_match_all('/(\d+\.\d+\.\d+\.\d+(:\d+)?)/', $content, $matches);

print_r($matches[1]);

这将打印出IP的列表：
Array
(
    [0] => 1.204.168.15:6673
    [1] => 1.234.45.130:80
    [2] => 1.34.163.101:8080
    [3] => 1.34.29.89:8080
    [4] => 1.34.8.221:3128
    ....

希望有帮助。
$content=curl\u exec（$ch）；但在函数中，您分配了$header['content']=$content并返回$header
。。。因此，内容应该在$source1['content']
中。。。不管怎样，很高兴你弄明白了…$content=curl\u exec（$ch）；但在函数中，您分配了$header['content']=$content并返回$header
。。。因此，内容应该在$source1['content']
中。。。不管怎样，很高兴你弄明白了。。。