PHP从子域获取域名_Php_Dns_Subdomain

PHP从子域获取域名

php dns

PHP从子域获取域名,php,dns,subdomain,Php,Dns,Subdomain,我需要写一个函数来解析包含域名的变量。我最好用一个例子来解释这一点，变量可以包含以下任何内容： here.example.com example.com example.org here.example.org 但是当通过我的函数时，所有这些都必须返回example.com或example.co.uk，根域名。我肯定我以前做过，但我在谷歌搜索了大约20分钟，什么也找不到。任何帮助都将不胜感激编辑：忽略.co.uk，假设所有通过此功能的域都有一个3个字母的TLD。Stackoverflow问题

我需要写一个函数来解析包含域名的变量。我最好用一个例子来解释这一点，变量可以包含以下任何内容：

here.example.com
example.com
example.org
here.example.org

但是当通过我的函数时，所有这些都必须返回example.com或example.co.uk，根域名。我肯定我以前做过，但我在谷歌搜索了大约20分钟，什么也找不到。任何帮助都将不胜感激

编辑：忽略.co.uk，假设所有通过此功能的域都有一个3个字母的TLD。

Stackoverflow问题存档：

打印获取域（“http://somedomain.co.uk"); // 输出“somedomain.co.uk”
函数get_domain（$url）
{
$pieces=parse_url（$url）；
$domain=isset（$pieces['host']）？$pieces['host']：''；
if（preg_match（'/（？P[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6}）$/i'，$domain，$regs））{
返回$regs['domain']；
}
返回false；
}

正则表达式可以帮助您。试着这样做：

（[^.]+（.com |.co.uk））$

我认为您的问题在于您没有明确定义您希望函数做什么。从您的示例中，您当然不希望它盲目地返回名称的最后两个或最后三个组件，但仅仅知道它不应该做什么是不够的

以下是我对您真正想要的内容的猜测：有一些二级域名，如

co.uk.

，您希望将其视为一个TLD（顶级域名），以实现此功能。在这种情况下，我建议列举所有此类情况，并将它们作为键放入具有虚拟值的关联数组中，以及所有正常的顶级域，如

com.

，

net.

，

info.

，等等。然后，每当您获得新域名，提取最后两个组件，并查看结果字符串是否作为键存在于数组中。如果没有，则只提取最后一个组件，并确保它在数组中。（即使不是，它也不是一个有效的域名）无论如何，不管你在数组中找到了什么密钥，从域名的末尾再加上一个组件，你就拥有了你的基本域名

也许，您可以编写一个函数，而不是使用关联数组，来判断最后两个组件是否应被视为一个“有效TLD”，从而使事情变得更简单。该函数可能会查看下一个到最后一个组件，如果它短于3个字符，决定将其视为TLD的一部分。

我将执行以下操作：

// hierarchical array of top level domains
$tlds = array(
    'com' => true,
    'uk' => array(
        'co' => true,
        // …
    ),
    // …
);
$domain = 'here.example.co.uk';
// split domain
$parts = explode('.', $domain);
$tmp = $tlds;
// travers the tree in reverse order, from right to left
foreach (array_reverse($parts) as $key => $part) {
    if (isset($tmp[$part])) {
        $tmp = $tmp[$part];
    } else {
        break;
    }
}
// build the result
var_dump(implode('.', array_slice($parts, - $key - 1)));

为了做好这项工作，您需要一个二级域和顶级域的列表，并构建一个适当的正则表达式列表。有关二级域的详细列表，请访问。除了上述CentralNic.uk.com变体之外，另一个测试用例是梵蒂冈：他们的网站技术上位于：这是一个很难匹配的网站

Ah-如果您只想处理三个字符的顶级域，那么下面的代码可以工作：

<?php 
// let's test the code works: these should all return
// example.com , example.net or example.org
$domains=Array('here.example.com',
            'example.com',
            'example.org',
        'here.example.org',
        'example.com/ignorethis',
        'example.net/',
        'http://here.example.org/longtest?string=here');
foreach ($domains as $domain) {
 testdomain($domain);
}

function testdomain($url) {
 if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{3})(\/.*)?$/',$url,$matches)) {
    print 'Domain is: '.$matches[3].'.'.$matches[4].'<br>'."\n";
 } else {
    print 'Domain not found in '.$url.'<br>'."\n";
 }
}
?>

或者让它应对一切：

if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.(co\.uk|me\.uk|org\.uk|com|org|net|int|eu)(\/.*)?$/',$url,$matches)) {

etc等

基于乔纳森的回答：

function main_domain($domain) {
  if (preg_match('/([a-z0-9][a-z0-9\-]{1,63})\.([a-z]{3}|[a-z]{2}\.[a-z]{2})$/i', $domain, $regs)) {
    return $regs;
  }

  return false;
}

他的表达方式可能会更好一些，但这个界面看起来更像您所描述的。

几乎可以肯定，您要寻找的是：

function isTopLevelDomain($domain)
{
    $domainParts = explode('.', $domain);
    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;
    array_shift($previousDomainParts);

    $tld = implode('.', $previousDomainParts);

    return isDomainExtension($tld);
}

function isDomainExtension($domain)
{
    $tlds = getTLDs();

    /**
     * direct hit
     */
    if (in_array($domain, $tlds)) {
        return true;
    }

    if (in_array('!'. $domain, $tlds)) {
        return false;
    }

    $domainParts = explode('.', $domain);

    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;

    array_shift($previousDomainParts);
    array_unshift($previousDomainParts, '*');

    $wildcardDomain = implode('.', $previousDomainParts);

    return in_array($wildcardDomain, $tlds);
}

function getTLDs()
{
    static $mozillaTlds = array();

    if (empty($mozillaTlds)) {
        require 'fetch_mozilla_tlds.php';
        /* @var $mozillaTlds array */
    }

    return $mozillaTlds;
}

这是一个PHP库，它利用了publicsuffix.org/list/上收集的各种TLD的完整列表（尽可能实际），并将其封装在一个漂亮的小函数中

一旦包含了库，就可以轻松地执行以下操作：

$registeredDomain=getRegisteredDomain（$domain）
以下是从任何URL中删除TLD的方法-我编写代码是为了在我的网站上工作：
-这是一个在我的网站上使用的工作解决方案。



$host是必须解析的URL。此代码是一个简单且可靠的解决方案

与我所看到的所有其他内容相比，它适用于我尝试过的任何URL

请参阅下面的代码，解析您正在查看的页面


================================================================================
$host = filter_var($_GET['dns']);
$host = $host . '/'; // needed if URL does not have trailing slash

// Strip www, http, https header ;

$host = str_replace( 'http://www.' , '' , $host );
$host = str_replace( 'https://www.' , '' , $host );

$host = str_replace( 'http://' , '' , $host );
$host = str_replace( 'https://' , '' , $host );
$pos = strpos($host, '/'); // find any sub directories
$host = substr( $host, 0, $pos );  //strip directories

$hostArray = explode (".", $host); // count parts of TLD
$size = count ($hostArray) -1; // really only need to know if not a single level TLD
$tld = $hostArray[$size]; // do we need to parse the TLD any further - 
                          // remove subdomains?

if ($size > 1) {
    if ($tld == "aero" or $tld == "asia" or $tld == "biz" or $tld == "cat" or
        $tld == "com" or $tld == "coop" or $tld == "edu" or $tld == "gov" or
        $tld == "info" or $tld == "int" or $tld == "jobs" or $tld == "me" or
        $tld == "mil" or $tld == "mobi" or $tld == "museum" or $tld == "name" or
        $tld == "net" or $tld == "org" or $tld == "pro" or $tld == "tel" or
        $tld == "travel" or $tld == "tv" or $tld == "ws" or $tld == "XXX") {

        $host = $hostArray[$size -1].".".$hostArray[$size]; // parse to 2 level TLD
    } else {
         // parse to 3 level TLD
        $host = $hostArray[$size -2].".".$hostArray[$size -1].".".$hostArray[$size] ;
    }
}

基于
作为乔纳森·桑普森的变种
function get_domain($url)   {   
    if ( !preg_match("/^http/", $url) )
        $url = 'http://' . $url;
    if ( $url[strlen($url)-1] != '/' )
        $url .= '/';
    $pieces = parse_url($url);
    $domain = isset($pieces['host']) ? $pieces['host'] : ''; 
    if ( preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs) ) { 
        $res = preg_replace('/^www\./', '', $regs['domain'] );
        return $res;
    }   
    return false;
}

函数获取域（$url）{
如果（！preg_match（“/^http/”，$url））
$url='http://'。$url；
如果（$url[strlen（$url）-1]！='/'））
$url.='/'；
$pieces=parse_url（$url）；
$domain=isset（$pieces['host']）？$pieces['host']：''；
如果（preg_match（'/（？P[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6}）$/i'，$domain，$regs））{
$res=preg_replace（“/^www./”，“$regs['domain']）；
返回$res；
}   
返回false；
}
以下是我正在使用的：
它在tld不需要任何阵列的情况下工作得很好
$split = array_reverse(explode(".", $_SERVER['HTTP_HOST']));
$domain = $split[1].".".$split[0];

if(function_exists('gethostbyname'))
{
    if(gethostbyname($domain) != $_SERVER['SERVER_ADDR'] && isset($split[2]))
    {   
        $domain = $split[2].".".$split[1].".".$split[0];
    }
}

不使用TLD列表进行比较是不可能的，因为它们存在许多类似的情况
或
但即使这样，你也不会因为喜欢或不喜欢而在任何情况下都取得成功
如果您需要完整的列表，可以使用：

请随意使用我的功能。它不会使用正则表达式，而且速度很快：
此脚本生成一个Perl文件，其中包含一个函数，即从ETLD文件获取域。假设你有像img1，img2，img3这样的主机名。。。在photobucket.com。对于每个get_域，$host将返回photobucket.com。请注意，这并不是地球上最快的函数，因此在使用它的主日志解析器中，我保留了主机到域映射的哈希，并且只对尚未在哈希中的主机运行此函数
#!/bin/bash

cat << 'EOT' > suffixes.pl
#!/bin/perl

sub get_domain {
  $_ = shift;
EOT

wget -O - http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 \
  | iconv -c -f UTF-8 -t ASCII//TRANSLIT \
  | egrep -v '/|^$' \
  | sed -e 's/^\!//' -e "s/\"/'/g" \
  | awk '{ print length($0),$0 | "sort -rn"}' | cut -d" " -f2- \
  | while read SUFF; do
      STAR=`echo $SUFF | cut -b1`
      if [ "$STAR" = '*' ]; then
        SUFF=`echo $SUFF | cut -b3-`
        echo "  return \"\$1\.\$2\.$SUFF\" if /([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      else
        echo "  return \"\$1\.$SUFF\" if /([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      fi
    done >> suffixes.pl

cat << 'EOT' >> suffixes.pl
}

1;
EOT

#/bin/bash
cat后缀.pl
#!/bin/perl
子域{
$=移位；
EOT
wget-O-http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 \
|iconv-c-f UTF-8-t ASCII//translatit\
|白鹭-v'/^$'\
|sed-e's/^\！/'-e“s/\“/'/g”\
|awk'{打印长度（$0），$0 |“sort-rn”}'| cut-d'-f2-\
|边读边做
星形=`echo$SUFF | cut-b1`
如果[“$STAR”=“*”]，则
function get_domain($url)   {   
    if ( !preg_match("/^http/", $url) )
        $url = 'http://' . $url;
    if ( $url[strlen($url)-1] != '/' )
        $url .= '/';
    $pieces = parse_url($url);
    $domain = isset($pieces['host']) ? $pieces['host'] : ''; 
    if ( preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs) ) { 
        $res = preg_replace('/^www\./', '', $regs['domain'] );
        return $res;
    }   
    return false;
}

$split = array_reverse(explode(".", $_SERVER['HTTP_HOST']));
$domain = $split[1].".".$split[0];

if(function_exists('gethostbyname'))
{
    if(gethostbyname($domain) != $_SERVER['SERVER_ADDR'] && isset($split[2]))
    {   
        $domain = $split[2].".".$split[1].".".$split[0];
    }
}

#!/bin/bash

cat << 'EOT' > suffixes.pl
#!/bin/perl

sub get_domain {
  $_ = shift;
EOT

wget -O - http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 \
  | iconv -c -f UTF-8 -t ASCII//TRANSLIT \
  | egrep -v '/|^$' \
  | sed -e 's/^\!//' -e "s/\"/'/g" \
  | awk '{ print length($0),$0 | "sort -rn"}' | cut -d" " -f2- \
  | while read SUFF; do
      STAR=`echo $SUFF | cut -b1`
      if [ "$STAR" = '*' ]; then
        SUFF=`echo $SUFF | cut -b3-`
        echo "  return \"\$1\.\$2\.$SUFF\" if /([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      else
        echo "  return \"\$1\.$SUFF\" if /([a-zA-Z0-9\-]+)\.$SUFF\$/;"
      fi
    done >> suffixes.pl

cat << 'EOT' >> suffixes.pl
}

1;
EOT

$mozillaTlds = file('http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1');

function isTopLevelDomain($domain)
{
    $domainParts = explode('.', $domain);
    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;
    array_shift($previousDomainParts);

    $tld = implode('.', $previousDomainParts);

    return isDomainExtension($tld);
}

function isDomainExtension($domain)
{
    $tlds = getTLDs();

    /**
     * direct hit
     */
    if (in_array($domain, $tlds)) {
        return true;
    }

    if (in_array('!'. $domain, $tlds)) {
        return false;
    }

    $domainParts = explode('.', $domain);

    if (count($domainParts) == 1) {
        return false;
    }

    $previousDomainParts = $domainParts;

    array_shift($previousDomainParts);
    array_unshift($previousDomainParts, '*');

    $wildcardDomain = implode('.', $previousDomainParts);

    return in_array($wildcardDomain, $tlds);
}

function getTLDs()
{
    static $mozillaTlds = array();

    if (empty($mozillaTlds)) {
        require 'fetch_mozilla_tlds.php';
        /* @var $mozillaTlds array */
    }

    return $mozillaTlds;
}

$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('here.example.com');
$result->getSubdomain(); // will return (string) 'here'
$result->getHostname(); // will return (string) 'example'
$result->getSuffix(); // will return (string) 'com'

function get_domaininfo($url) {
    // regex can be replaced with parse_url
    preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
    $parts = explode(".", $matches[2]);
    $tld = array_pop($parts);
    $host = array_pop($parts);
    if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
        $tld = "$host.$tld";
        $host = array_pop($parts);
    }

    return array(
        'protocol' => $matches[1],
        'subdomain' => implode(".", $parts),
        'domain' => "$host.$tld",
        'host'=>$host,'tld'=>$tld
    );
}

print_r(get_domaininfo('http://mysubdomain.domain.co.uk/index.php'));

Array
(
    [protocol] => https
    [subdomain] => mysubdomain
    [domain] => domain.co.uk
    [host] => domain
    [tld] => co.uk
)

$justDomain = $_SERVER['SERVER_NAME'];
switch(substr_count($justDomain, '.')) {
    case 1:
        // 2 parts. Must not be a subdomain. Do nothing.
        break;

    case 2:
        // 3 parts. Either a subdomain or a 2-part suffix
        // If the 2nd part is over 3 chars's, assume it to be the main domain part which means we have a subdomain.
        // This isn't foolproof, but should be ok for most domains.
        // Something like domainname.parliament.nz would cause problems, though. As would www.abc.com
        $parts = explode('.', $justDomain);
        if(strlen($parts[1]) > 3) {
            unset($parts[0]);
            $justDomain = implode('.', $parts);
        }
        break;

    default:
        // 4+ parts. Must be a subdomain.
        $parts = explode('.', $justDomain, 2);
        $justDomain = $parts[1];
        break;
}

// $justDomain should now exclude any subdomain part.

function get_domain($host){
  $myhost = strtolower(trim($host));
  $count = substr_count($myhost, '.');
  if($count === 2){
    if(strlen(explode('.', $myhost)[1]) > 3) $myhost = explode('.', $myhost, 2)[1];
  } else if($count > 2){
    $myhost = get_domain(explode('.', $myhost, 2)[1]);
  }
  return $myhost;
}

echo  parse_url($your_url)['host'];

//For short domain like t.co (twitter) the function should be :

function get_domain($url)
{
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{0,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}