Php 从URL解析域
我需要构建一个从URL解析域的函数 所以,用Php 从URL解析域,php,Php,我需要构建一个从URL解析域的函数 所以,用 http://google.com/dhasjkdas/sadsdds/sdda/sdads.html 或 http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html 它应该返回google.com 与 http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html 它应该返回google.co.uk检查: parse_url不能很好地处理严重损坏的url
http://google.com/dhasjkdas/sadsdds/sdda/sdads.html
或
http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html
它应该返回google.com
与
http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html
它应该返回google.co.uk
检查:
parse_url
不能很好地处理严重损坏的url,但如果您通常希望得到良好的url,则可以 来自
出于某种奇怪的原因,解析url
将主机(例如example.com)返回为
中未提供方案时的路径
输入url。所以我写了一篇简短的文章
获取真实主机的函数:
这将返回
google.com
中的两个。。。还有…这是我编写的代码,100%只查找域名,因为它需要考虑mozilla子TLD。唯一需要检查的是如何缓存该文件,这样就不会每次都查询mozilla
出于某种奇怪的原因,像co.uk这样的域名不在列表中,所以你必须进行一些黑客攻击并手动添加它们。这不是最干净的解决方案,但我希望它能帮助别人
//=====================================================
static function domain($url)
{
$slds = "";
$url = strtolower($url);
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
if(!$subtlds = @kohana::cache('subtlds', null, 60))
{
$content = file($address);
foreach($content as $num => $line)
{
$line = trim($line);
if($line == '') continue;
if(@substr($line[0], 0, 2) == '/') continue;
$line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(@$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(Array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au',
),$subtlds);
$subtlds = array_unique($subtlds);
//echo var_dump($subtlds);
@kohana::cache('subtlds', $subtlds);
}
preg_match('/^(http:[\/]{2,})?([^\/]+)/i', $url, $matches);
//preg_match("/^(http:\/\/|https:\/\/|)[a-zA-Z-]([^\/]+)/i", $url, $matches);
$host = @$matches[2];
//echo var_dump($matches);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub)
{
if (preg_match("/{$sub}$/", $host, $xyz))
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
return @$matches[0];
}
//=====================================================
静态函数域($url)
{
$slds=“”;
$url=strtolower($url);
$address='1http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
if(!$subtlds=@kohana::cache('subtlds',null,60))
{
$content=文件($address);
foreach($num=>$line形式的内容)
{
$line=修剪($line);
如果($line='')继续;
如果(@substr($line[0],0,2)='/')继续;
$line=@preg_replace(“/[^a-zA-Z0-9\.]/”,“$line”);
如果($line='')继续;//$line=''。$line;
如果(@$line[0]=='。)$line=substr($line,1);
如果(!strstr($line,“.”)继续;
$subtlds[]=$line;
//echo“{$num}:'{$line}'”;echo“
”;
}
$subtlds=array\u merge(数组(
“co.uk”、“me.uk”、“net.uk”、“org.uk”、“sch.uk”、“ac.uk”,
“gov.uk”、“nhs.uk”、“police.uk”、“mod.uk”、“asn.au”、“com.au”,
“net.au”、“id.au”、“org.au”、“edu.au”、“gov.au”、“csiro.au”,
),$subtlds);
$subtlds=array_unique($subtlds);
//echo var_dump($subtlds);
@缓存('subtlds',$subtlds);
}
preg_match('/^(http:[\/]{2,})([^\/]+)/i',$url,$matches);
//preg\u match(“/^(http:\/\/;https:\/\/\/)[a-zA-Z-]([^\/]+)/i“,$url,$matches);
$host=@$matches[2];
//echo var_dump($matches);
预匹配(“/[^\.\/]+\.[^\.\/]+$/”,$host,$MATCHS);
foreach($subtlds作为$sub)
{
if(preg_匹配(“/{$sub}$/”,$host,$xyz))
预匹配(“/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/”,$host,$MATCHS);
}
返回@$matches[0];
}
本来应该100%工作的代码对我来说似乎并没有起到作用,我确实对示例进行了一些修补,但发现代码没有帮助,并且存在问题。所以我把它改成了几个函数(省去了一直向Mozilla索要列表和删除缓存系统的麻烦)。这已经针对一组1000个URL进行了测试,似乎有效
function domain($url)
{
global $subtlds;
$slds = "";
$url = strtolower($url);
$host = parse_url('http://'.$url,PHP_URL_HOST);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub){
if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
}
return @$matches[0];
}
function get_tlds() {
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
$content = file($address);
foreach ($content as $num => $line) {
$line = trim($line);
if($line == '') continue;
if(@substr($line[0], 0, 2) == '/') continue;
$line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(@$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
), $subtlds);
$subtlds = array_unique($subtlds);
return $subtlds;
}
我知道我应该把它变成一个类,但是没有时间。parse\u url对我不起作用。它只返回了路径。使用php5.3+切换到基本功能:
$url = str_replace('http://', '', strtolower( $s->website));
if (strpos($url, '/')) $url = strstr($url, '/', true);
如果输入URL不是完全的垃圾,这通常会非常有效。它删除子域
$host = parse_url( $Row->url, PHP_URL_HOST );
$parts = explode( '.', $host );
$parts = array_reverse( $parts );
$domain = $parts[1].'.'.$parts[0];
示例
输入:http://www2.website.com:8080/some/file/structure?some=parameters
输出:
website.com
您可以将PHP\u URL\u主机作为第二个参数传递到parse\u URL函数中
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'
我为您编辑了:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
$host = trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
$parts = explode( '.', $host );
$num_parts = count($parts);
if ($parts[0] == "www") {
for ($i=1; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}else {
for ($i=0; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}
return substr($h,0,-1);
}
函数getHost($Address){
$parseUrl=parse_url(trim($Address));
$host=trim($parseUrl['host']?$parseUrl['host']:数组移位(分解('/',$parseUrl['path'],2));
$parts=分解('.',$host);
$num_parts=计数($parts);
如果($parts[0]=“www”){
对于($i=1;$i<$num_部分;$i++){
$h.=$parts[$i].';
}
}否则{
对于($i=0;$i<$num_部分;$i++){
$h.=$parts[$i].';
}
}
返回substr($h,0,-1);
}
所有类型的url(www.domain.ltd,sub1.subn.domain.ltd)将生成:domain.ltd.如下所示使用即可
$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))
<?php
echo $_SERVER['SERVER_NAME'];
?>
将worldofjr和Alix Axel的答案组合成一个小功能,可以处理大多数用例:
function get_url_hostname($url) {
$parse = parse_url($url);
return str_ireplace('www.', '', $parse['host']);
}
get_url_hostname('http://www.google.com/example/path/file.html'); // google.com
如果要从字符串中提取主机
http://google.com/dhasjkdas/sadsdds/sdda/sdads.html
,使用parse_url()是您可以接受的解决方案
但如果您想要提取域或其部分,则需要使用该包。是的,您可以在parse_url()周围使用字符串函数,但有时会产生错误的结果
我建议进行域解析,下面是显示差异的示例代码:
$extract = new LayerShifter\TLDExtract\Extract();
# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return google.com
$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'
# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return 'search.google.com'
$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'
我很晚才添加这个答案,因为这是谷歌上弹出最多的答案 您可以使用PHP来
$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"
获取主机而不是主机所指的私有域。(例如www.google.co.uk
是主机,但google.co.uk
是私有域)
要获取私有域,您必须知道可以注册私有域的公共后缀列表
当已经创建了一个公共后缀数组时,下面的代码可以工作
$domain = get_private_domain("www.google.co.uk");
用剩下的代码
// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];
function get_public_suffix($host) {
$parts = split("\.", $host);
while (count($parts) > 0) {
if (is_public_suffix(join(".", $parts)))
return join(".", $parts);
array_shift($parts);
}
return false;
}
function is_public_suffix($host) {
global $suffix;
return isset($suffix[$host]);
}
function get_private_domain($host) {
$public = get_public_suffix($host);
$public_parts = split("\.", $public);
$all_parts = split("\.", $host);
$private = [];
for ($x = 0; $x < count($public_parts); ++$x)
$private[] = array_pop($all_parts);
if (count($all_parts) > 0)
$private[] = array_pop($all_parts);
return join(".", array_reverse($private));
}
//找到一些方法来解析上面的公共后缀列表
//然后将它们添加到PHP数组中
$suffix=[…所有有效的公共后缀…];
函数get\u public\u后缀($host){
$parts=split(“\”,$hos
function get_url_hostname($url) {
$parse = parse_url($url);
return str_ireplace('www.', '', $parse['host']);
}
get_url_hostname('http://www.google.com/example/path/file.html'); // google.com
$extract = new LayerShifter\TLDExtract\Extract();
# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return google.com
$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'
# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return 'search.google.com'
$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"
$domain = get_private_domain("www.google.co.uk");
// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];
function get_public_suffix($host) {
$parts = split("\.", $host);
while (count($parts) > 0) {
if (is_public_suffix(join(".", $parts)))
return join(".", $parts);
array_shift($parts);
}
return false;
}
function is_public_suffix($host) {
global $suffix;
return isset($suffix[$host]);
}
function get_private_domain($host) {
$public = get_public_suffix($host);
$public_parts = split("\.", $public);
$all_parts = split("\.", $host);
$private = [];
for ($x = 0; $x < count($public_parts); ++$x)
$private[] = array_pop($all_parts);
if (count($all_parts) > 0)
$private[] = array_pop($all_parts);
return join(".", array_reverse($private));
}
function getHost($url) {
$parseUrl = parse_url(trim($url));
if(isset($parseUrl['host']))
{
$host = $parseUrl['host'];
}
else
{
$path = explode('/', $parseUrl['path']);
$host = $path[0];
}
return trim($host);
}
echo getHost("http://example.com/anything.html"); // example.com
echo getHost("http://www.example.net/directory/post.php"); // www.example.net
echo getHost("https://example.co.uk"); // example.co.uk
echo getHost("www.example.net"); // example.net
echo getHost("subdomain.example.net/anything"); // subdomain.example.net
echo getHost("example.net"); // example.net
$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'www.google.com'
echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.com
echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.co.uk
function getDomain($url) {
$host = parse_url($url, PHP_URL_HOST);
if(filter_var($host,FILTER_VALIDATE_IP)) {
// IP address returned as domain
return $host; //* or replace with null if you don't want an IP back
}
$domain_array = explode(".", str_replace('www.', '', $host));
$count = count($domain_array);
if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
// SLD (example.co.uk)
return implode('.', array_splice($domain_array, $count-3,3));
} else if( $count>=2 ) {
// TLD (example.com)
return implode('.', array_splice($domain_array, $count-2,2));
}
}
// Your domains
echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk
// TLD
echo getDomain('https://shop.example.com'); // example.com
echo getDomain('https://foo.bar.example.com'); // example.com
echo getDomain('https://www.example.com'); // example.com
echo getDomain('https://example.com'); // example.com
// SLD
echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://bbc.co.uk'); // bbc.co.uk
// IP
echo getDomain('https://1.2.3.45'); // 1.2.3.45
function getTrimmedUrl($link)
{
$str = str_replace(["www.","https://","http://"],[''],$link);
$link = explode("/",$str);
return strtolower($link[0]);
}