Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/420.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jquery/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 从字符串中提取主机名_Javascript_Jquery_Regex - Fatal编程技术网

Javascript 从字符串中提取主机名

Javascript 从字符串中提取主机名,javascript,jquery,regex,Javascript,Jquery,Regex,我希望只匹配URL的根,而不是文本字符串中的整个URL。鉴于: http://www.youtube.com/watch?v=ClkQA2Lb_iE http://youtu.be/ClkQA2Lb_iE http://www.example.com/12xy45 http://example.com/random 我想将最后两个实例解析到www.example.com或example.com域 我听说正则表达式很慢,这将是我在页面上的第二个正则表达式,所以如果没有正则表达式,请告诉我 我正在

我希望只匹配URL的根,而不是文本字符串中的整个URL。鉴于:

http://www.youtube.com/watch?v=ClkQA2Lb_iE
http://youtu.be/ClkQA2Lb_iE
http://www.example.com/12xy45
http://example.com/random
我想将最后两个实例解析到
www.example.com
example.com

我听说正则表达式很慢,这将是我在页面上的第二个正则表达式,所以如果没有正则表达式,请告诉我

我正在寻找此解决方案的JS/jQuery版本。

试试这个:

var matches = url.match(/^https?\:\/\/([^\/?#]+)(?:[\/?#]|$)/i);
var domain = matches && matches[1];  // domain will be null if no match is found
如果要从结果中排除端口,请改用以下表达式:

/^https?\:\/\/([^\/:?#]+)(?:[\/:?#]|$)/i
编辑:要防止特定域匹配,请使用负前瞻<代码>(?!youtube.com)


解析URL可能很棘手,因为您可能有端口号和特殊字符。因此,我建议您使用类似的方法来完成此操作。我怀疑性能是否会成为一个问题,除非您正在解析数百个URL。

一个不使用正则表达式的巧妙技巧:

var tmp        = document.createElement ('a');
;   tmp.href   = "http://www.example.com/12xy45";

// tmp.hostname will now contain 'www.example.com'
// tmp.host will now contain hostname and port 'www.example.com:80'
将上述内容封装在下面这样的函数中,您就拥有了从URI中获取域部分的极好方法

function url_domain(data) {
  var    a      = document.createElement('a');
         a.href = data;
  return a.hostname;
}

我试着使用给定的解决方案,所选择的解决方案对我来说是一种过分的手段,而“创建一个元素”会让我搞砸

URL中的端口尚未准备就绪。我希望有人觉得它有用

function parseURL(url){
    parsed_url = {}

    if ( url == null || url.length == 0 )
        return parsed_url;

    protocol_i = url.indexOf('://');
    parsed_url.protocol = url.substr(0,protocol_i);

    remaining_url = url.substr(protocol_i + 3, url.length);
    domain_i = remaining_url.indexOf('/');
    domain_i = domain_i == -1 ? remaining_url.length - 1 : domain_i;
    parsed_url.domain = remaining_url.substr(0, domain_i);
    parsed_url.path = domain_i == -1 || domain_i + 1 == remaining_url.length ? null : remaining_url.substr(domain_i + 1, remaining_url.length);

    domain_parts = parsed_url.domain.split('.');
    switch ( domain_parts.length ){
        case 2:
          parsed_url.subdomain = null;
          parsed_url.host = domain_parts[0];
          parsed_url.tld = domain_parts[1];
          break;
        case 3:
          parsed_url.subdomain = domain_parts[0];
          parsed_url.host = domain_parts[1];
          parsed_url.tld = domain_parts[2];
          break;
        case 4:
          parsed_url.subdomain = domain_parts[0];
          parsed_url.host = domain_parts[1];
          parsed_url.tld = domain_parts[2] + '.' + domain_parts[3];
          break;
    }

    parsed_url.parent_domain = parsed_url.host + '.' + parsed_url.tld;

    return parsed_url;
}

运行此:

parseURL('https://www.facebook.com/100003379429021_356001651189146');
结果:

Object {
    domain : "www.facebook.com",
    host : "facebook",
    path : "100003379429021_356001651189146",
    protocol : "https",
    subdomain : "www",
    tld : "com"
}
youtube.com
youtu.be
example.com
example.com

我建议使用npm包。“公共后缀列表”是所有有效域后缀和规则的列表,不仅包括国家代码顶级域,还包括被视为根域的unicode字符(即b.c.kobe.jp等)。阅读更多关于它的信息

尝试:

然后使用我的“extractHostname”实现运行:

let psl = require('psl');
let url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
psl.get(extractHostname(url)); // returns youtube.com
我不能使用npm包,所以下面只测试主机名

函数提取主机名(url){
var主机名;
//查找并删除协议(http、ftp等)并获取主机名
if(url.indexOf(“/”>-1){
hostname=url.split('/')[2];
}
否则{
hostname=url.split('/')[0];
}
//查找并删除端口号
hostname=hostname.split(“:”)[0];
//查找并删除“?”
hostname=hostname.split(“?”)[0];
返回主机名;
}
//测试代码
console.log(“==Testing extractHostname:=”);
console.log(提取主机名(“http://www.blog.classroom.me.uk/index.php"));
console.log(提取主机名(“http://www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(提取主机名(“https://www.youtube.com/watch?v=ClkQA2Lb_iE"));
log(extractHostname(“www.youtube.com/watch?v=ClkQA2Lb_iE”);
console.log(提取主机名(“ftps://ftp.websitename.com/dir/file.txt"));
log(extractHostname(“websitename.com:1234/dir/file.txt”);
console.log(提取主机名(“ftps://websitename.com:1234/dir/file.txt"));
log(extractHostname(“example.com?param=value”);
console.log(提取主机名(“https://facebook.github.io/jest/"));
log(extractHostname(“//youtube.com/watch?v=ClkQA2Lb_iE”);
console.log(提取主机名(“http://localhost:4200/watch?v=ClkQA2Lb_iE"));
//警告:您可以使用此函数提取“根”域,但其精度不如使用psl包。
函数extractRootDomain(url){
var domain=extractHostname(url),
splitArr=domain.split('.'),
arrLen=拆分arr.长度;
//在这里提取根域
//如果有子域
如果(arrLen>2){
domain=splitArr[arrLen-2]+'.+splitArr[arrLen-1];
//检查是否使用国家代码顶级域(ccTLD)(即“.me.uk”)
if(splitArr[arrLen-2]。长度==2&&splitArr[arrLen-1]。长度==2){
//这是使用ccTLD
domain=splitArr[arrLen-3]+'..+域;
}
}
返回域;
}
//测试提取器根域
console.log(“==Testing extractRootDomain:=”);
console.log(extractRootDomain(“http://www.blog.classroom.me.uk/index.php"));
console.log(extractRootDomain(“http://www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractRootDomain(“https://www.youtube.com/watch?v=ClkQA2Lb_iE"));
log(extractRootDomain(“www.youtube.com/watch?v=ClkQA2Lb_iE”);
console.log(extractRootDomain(“ftps://ftp.websitename.com/dir/file.txt"));
log(extractRootDomain(“websitename.co.uk:1234/dir/file.txt”);
console.log(extractRootDomain(“ftps://websitename.com:1234/dir/file.txt"));
log(extractRootDomain(“example.com?param=value”);
console.log(extractRootDomain(“https://facebook.github.io/jest/"));
log(extractRootDomain(“//youtube.com/watch?v=ClkQA2Lb_iE”);
console.log(extractRootDomain(“http://localhost:4200/watch?v=ClkQA2Lb_iE"));
String.prototype.trim=function(){返回his.replace(/^\s+\s+$/g,“”)}
函数getHost(url){
如果(“未定义”==typeof(url)| | null==url)返回“”;
url=url.trim();如果(“==url)返回“”;
变量_主机_arr;
如果(-1)
上述代码将成功解析以下示例URL的主机名:

first.com

mail.google.com

mail.google.com

某处网站

另一个.欧盟


最初的功劳是:

如果你在这个页面上找到了最好的正则表达式URL,那么试试这个:

^(?:https?:)?(?:\/\/)?([^\/\?]+)

它适用于不带http://、带http、带https、带just//的URL,并且不获取路径和查询路径


祝你好运

不需要解析字符串,只需将URL作为参数传递给:


请使用regex尝试下面的代码以获得确切的域名

字符串行=”


简而言之,你可以这样做

var url = "http://www.someurl.com/support/feature"

function getDomain(url){
  domain=url.split("//")[1];
  return domain.split("/")[0];
}
eg:
  getDomain("http://www.example.com/page/1")

  output:
   "www.example.com"
使用上述功能获取域名

代码:

var regex = /\w+.(com|co\.kr|be)/ig;
var urls = ['http://www.youtube.com/watch?v=ClkQA2Lb_iE',
            'http://youtu.be/ClkQA2Lb_iE',
            'http://www.example.com/12xy45',
            'http://example.com/random'];


$.each(urls, function(index, url) {
    var convertedUrl = url.match(regex);
    console.log(convertedUrl);
});
结果:

Object {
    domain : "www.facebook.com",
    host : "facebook",
    path : "100003379429021_356001651189146",
    protocol : "https",
    subdomain : "www",
    tld : "com"
}
youtube.com
youtu.be
example.com
example.com

好吧,我知道这是一个老问题,但我做了一个超级高效的url解析器,所以我想和大家分享一下

正如你所看到的,函数的结构非常奇怪,但是
const url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
const { hostname } = new URL(url);

console.assert(hostname === 'www.youtube.com');
  String pattern3="([\\w\\W]\\.)+(.*)?(\\.[\\w]+)";

  Pattern r = Pattern.compile(pattern3);


  Matcher m = r.matcher(line);
  if (m.find( )) {

    System.out.println("Found value: " + m.group(2) );
  } else {
     System.out.println("NO MATCH");
  }
var url = "http://www.someurl.com/support/feature"

function getDomain(url){
  domain=url.split("//")[1];
  return domain.split("/")[0];
}
eg:
  getDomain("http://www.example.com/page/1")

  output:
   "www.example.com"
var regex = /\w+.(com|co\.kr|be)/ig;
var urls = ['http://www.youtube.com/watch?v=ClkQA2Lb_iE',
            'http://youtu.be/ClkQA2Lb_iE',
            'http://www.example.com/12xy45',
            'http://example.com/random'];


$.each(urls, function(index, url) {
    var convertedUrl = url.match(regex);
    console.log(convertedUrl);
});
youtube.com
youtu.be
example.com
example.com
function getDomain(url) {
    var dom = "", v, step = 0;
    for(var i=0,l=url.length; i<l; i++) {
        v = url[i]; if(step == 0) {
            //First, skip 0 to 5 characters ending in ':' (ex: 'https://')
            if(i > 5) { i=-1; step=1; } else if(v == ':') { i+=2; step=1; }
        } else if(step == 1) {
            //Skip 0 or 4 characters 'www.'
            //(Note: Doesn't work with www.com, but that domain isn't claimed anyway.)
            if(v == 'w' && url[i+1] == 'w' && url[i+2] == 'w' && url[i+3] == '.') i+=4;
            dom+=url[i]; step=2;
        } else if(step == 2) {
            //Stop at subpages, queries, and hashes.
            if(v == '/' || v == '?' || v == '#') break; dom += v;
        }
    }
    return dom;
}
hostname="http://www.example.com:1234" hostname.split("//").slice(-1)[0].split(":")[0].split('.').slice(-2).join('.') // gives "example.com" "http://example.com".split("//").slice(-1)[0].split(":")[0].split('.').slice(-2).join('.') "http://example.com:1234".split("//").slice(-1)[0].split(":")[0].split('.').slice(-2).join('.') "http://www.example.com:1234".split("//").slice(-1)[0].split(":")[0].split('.').slice(-2).join('.') "http://foo.www.example.com:1234".split("//").slice(-1)[0].split(":")[0].split('.').slice(-2).join('.')
$('<a>').attr('href', url).prop('hostname');
function myFunction() {
    var str = "https://www.123rf.com/photo_10965738_lots-oop.html";
    matches = str.split('/');
    return matches[2];
}
getUrlParts("https://news.google.com/news/headlines/technology.html?ned=us&hl=en")
{
  "origin": "https://news.google.com",
  "domain": "news.google.com",
  "subdomain": "news",
  "domainroot": "google.com",
  "domainpath": "news.google.com/news/headlines",
  "tld": ".com",
  "path": "news/headlines/technology.html",
  "query": "ned=us&hl=en",
  "protocol": "https",
  "port": 443,
  "parts": [
    "news",
    "google",
    "com"
  ],
  "segments": [
    "news",
    "headlines",
    "technology.html"
  ],
  "params": [
    {
      "key": "ned",
      "val": "us"
    },
    {
      "key": "hl",
      "val": "en"
    }
  ]
}
function getUrlParts(fullyQualifiedUrl) {
    var url = {},
        tempProtocol
    var a = document.createElement('a')
    // if doesn't start with something like https:// it's not a url, but try to work around that
    if (fullyQualifiedUrl.indexOf('://') == -1) {
        tempProtocol = 'https://'
        a.href = tempProtocol + fullyQualifiedUrl
    } else
        a.href = fullyQualifiedUrl
    var parts = a.hostname.split('.')
    url.origin = tempProtocol ? "" : a.origin
    url.domain = a.hostname
    url.subdomain = parts[0]
    url.domainroot = ''
    url.domainpath = ''
    url.tld = '.' + parts[parts.length - 1]
    url.path = a.pathname.substring(1)
    url.query = a.search.substr(1)
    url.protocol = tempProtocol ? "" : a.protocol.substr(0, a.protocol.length - 1)
    url.port = tempProtocol ? "" : a.port ? a.port : a.protocol === 'http:' ? 80 : a.protocol === 'https:' ? 443 : a.port
    url.parts = parts
    url.segments = a.pathname === '/' ? [] : a.pathname.split('/').slice(1)
    url.params = url.query === '' ? [] : url.query.split('&')
    for (var j = 0; j < url.params.length; j++) {
        var param = url.params[j];
        var keyval = param.split('=')
        url.params[j] = {
            'key': keyval[0],
            'val': keyval[1]
        }
    }
    // domainroot
    if (parts.length > 2) {
        url.domainroot = parts[parts.length - 2] + '.' + parts[parts.length - 1];
        // check for country code top level domain
        if (parts[parts.length - 1].length == 2 && parts[parts.length - 1].length == 2)
            url.domainroot = parts[parts.length - 3] + '.' + url.domainroot;
    }
    // domainpath (domain+path without filenames) 
    if (url.segments.length > 0) {
        var lastSegment = url.segments[url.segments.length - 1]
        var endsWithFile = lastSegment.indexOf('.') != -1
        if (endsWithFile) {
            var fileSegment = url.path.indexOf(lastSegment)
            var pathNoFile = url.path.substr(0, fileSegment - 1)
            url.domainpath = url.domain
            if (pathNoFile)
                url.domainpath = url.domainpath + '/' + pathNoFile
        } else
            url.domainpath = url.domain + '/' + url.path
    } else
        url.domainpath = url.domain
    return url
}
function getHostname(){  
            secretDiv = document.createElement('div');
            secretDiv.innerHTML = "<a href='/'>x</a>";
            secretDiv = secretDiv.firstChild.href;
            var HasHTTPS = secretDiv.match(/https?:\/\//)[0];
            secretDiv = secretDiv.substr(HasHTTPS.length);
            secretDiv = secretDiv.substr(0, secretDiv.length - 1);
            return(secretDiv);  
}  

getHostname();
$('<a>').attr('href', document.location.href).prop('hostname');
    mainUrl = "http://www.mywebsite.com/mypath/to/folder";
    urlParts = /^(?:\w+\:\/\/)?([^\/]+)(.*)$/.exec(mainUrl);
    host = Fragment[1]; // www.mywebsite.com
new URL(url).host
const { fromUrl, parseDomain } = require("parse-domain");
parseDomain(fromUrl("http://www.example.com/12xy45"))
{ type: 'LISTED',
  hostname: 'www.example.com',
  labels: [ 'www', 'example', 'com' ],
  icann:
   { subDomains: [ 'www' ],
     domain: 'example',
     topLevelDomains: [ 'com' ] },
  subDomains: [ 'www' ],
  domain: 'example',
  topLevelDomains: [ 'com' ] }
parseDomain(fromUrl("http://subsub.sub.test.ExAmPlE.coM/12xy45"))
{ type: 'LISTED',
  hostname: 'subsub.sub.test.example.com',
  labels: [ 'subsub', 'sub', 'test', 'example', 'com' ],
  icann:
   { subDomains: [ 'subsub', 'sub', 'test' ],
     domain: 'example',
     topLevelDomains: [ 'com' ] },
  subDomains: [ 'subsub', 'sub', 'test' ],
  domain: 'example',
  topLevelDomains: [ 'com' ] }
import URL from 'url';

const pathname = URL.parse(url).path;
console.log(url.replace(pathname, ''));