Php preg_replace避免了一些标记_Php_Curl_Preg Replace_Preg Match

Php preg_replace避免了一些标记

php curl

Php preg_replace避免了一些标记,php,curl,preg-replace,preg-match,Php,Curl,Preg Replace,Preg Match,我想使用cURL登录到远程域上的网站，然后导航到不同的页面并进行各种数据查询问题在于，该网站上的某些链接是相对的。这使我的代码认为这些页面是本地的（它们当然不是本地的）。挖掘之后，我意识到我需要使用preg_match查找和区分相关链接，并使用preg_replace将它们设置为该服务器上实际存在的.js和.css文件的绝对url 当我运行这段代码时，它将重新启用avery链接，除了少数例外。所有链接都应该通过以下链接： -> 其余的相对链接保持原样。我不明白为什么。 css甚至不是第一

我想使用cURL登录到远程域上的网站，然后导航到不同的页面并进行各种数据查询

问题在于，该网站上的某些链接是相对的。这使我的代码认为这些页面是本地的（它们当然不是本地的）。

挖掘之后，我意识到我需要使用preg_match查找和区分相关链接，并使用preg_replace将它们设置为该服务器上实际存在的.js和.css文件的绝对url

当我运行这段代码时，它将重新启用avery链接，除了少数例外。所有链接都应该通过以下链接：

其余的相对链接保持原样。我不明白为什么。 css甚至不是第一个应该被替换的
这是我用来尝试访问远程站点的PHP脚本：

<?php $username = 'myuser'; $password = 'mypass'; $loginUrl = 'http://www.example.com/index.php/'; //init curl $ch = curl_init(); //Set the URL to work with curl_setopt($ch, CURLOPT_URL, $loginUrl); // ENABLE HTTP POST curl_setopt($ch, CURLOPT_POST, 1); //Set the post parameters curl_setopt($ch, CURLOPT_POSTFIELDS, 'uName='.$username.'&uPw='.$password.'&Submit=OK'); //Handle cookies for the login curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); //Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL //not to print out the results of its query. //Instead, it will return the results as a string return value //from curl_exec() instead of the usual true/false. curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //execute the request (the login) $store = curl_exec($ch); //the login is now done and you can continue to get the //protected content. //set the URL to the protected file curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/ask_for_info.php'); //execute the request $result = curl_exec($ch); curl_close($ch); if (!preg_match('/src="http?:\/\/"/', $result)) { $result = preg_replace('/src="(http:\/\/([^\/]+)\/)?([^"]+)"/', "src=\"http://www.example.com/\\3\"", $result); echo 'THIS'; } if (!preg_match('/href="http?:\/\/"/', $result)) { $result = preg_replace('/href="(http:\/\/([^\/]+)\/)?([^"]+)"/', "href=\"http://www.example.com/\\3\"", $result); echo 'THAT'; } print_r($result); ?>

在运行代码时检查Google Chrome控制台，我得到如下结果：

Resource interpreted as Stylesheet but transferred with MIME type text/html: "http://example.com/example.css". login4.php:6 Resource interpreted as Script but transferred with MIME type text/html: "http://example.com/js/prototype.js". login4.php:7 Uncaught SyntaxError: Unexpected token < prototype.js:1 Resource interpreted as Script but transferred with MIME type text/html: "http://example.com/js/popcalendar3_ajax.js?ver=2". login4.php:9 Uncaught SyntaxError: Unexpected token <

资源被解释为样式表，但使用MIME类型text/html传输：http://example.com/example.css". login4.php:6 解释为脚本但使用MIME类型text/html传输的资源：http://example.com/js/prototype.js". login4.php:7 未捕获的语法错误：意外标记
有什么想法吗？感谢您提供的任何帮助使用DOMDocument和XPath的示例： $scheme = 'http'; $host = 'example.com'; $path = '/'; $dom = new DOMDocument(); @$dom->loadHTML($result); $xpath = new DOMXPath($dom); $xquery = '//a/@href | //img/@src | //script/@src | //link/@href'; $urlAttrNodes = $xpath->query($xquery); $pattern = '~^(?!https?:// | www\. | // | ' . preg_quote($host) . '(?=/|$) ) (\.?/)?~xi'; foreach($urlAttrNodes as $urlAttrNode) { $absoluteUrl = preg_replace($pattern, "$scheme://www.$host$path", $urlAttrNode->nodeValue); $urlAttrNode->ownerElement->setAttribute($urlAttrNode->name, $absoluteUrl); } $result = $dom->saveHTML(); 请注意，该模式仅跳过当前主机，如果需要，您可以轻松添加其他域。如果不使用好的工具，使用DOMDocument更容易做到这一点。作为旁注，如果（！preg\u match… 是无用的，如果您只需要替换相对URL，则必须检查链接是否以http:// 或preg\u替换模式中的主机名开头。curl\u close（）之后的代码在逻辑上不清楚。这些对我来说已经是有价值的评论了！@casimirithippolyte DOMDocument会迫使我使用浏览器。如果以后我想在Cron作业中添加此代码，该怎么办？可能我还需要做一些艰苦的工作。其次，我确实想检查以http:// 开头的链接。所以我做错了。但它确实检查了并替换除3之外的所有链接。我正在查找此链接。对此有任何进一步的建议或示例吗？谢谢！“此文档将迫使我使用浏览器。”，绝对不是！您可以在服务器端使用PHP。请记住，preg_replace中的模式已经是一个检查，请使用它。这基本上回答了问题。它可以工作！谢谢！我现在需要弄清楚的是，它为什么要添加window.location.reload（） ，并不断刷新页面。然而，这可能是网站的错。我想到了。真的帮了大忙！