C# 从foo.bar.car.com中的URL获取特定子域_C#_Url

C# 从foo.bar.car.com中的URL获取特定子域

c# url

C# 从foo.bar.car.com中的URL获取特定子域,c#,url,C#,Url,给定一个URL，如下所示： foo.bar.car.com.au 我需要提取foo.bar 我遇到了以下代码： private static string GetSubDomain(Uri url) { if (url.HostNameType == UriHostNameType.Dns) { string host = url.Host; if (host.Split('.').Length > 2) {

给定一个URL，如下所示：

foo.bar.car.com.au

我需要提取

foo.bar

我遇到了以下代码：

private static string GetSubDomain(Uri url)
{
    if (url.HostNameType == UriHostNameType.Dns)
    {
        string host = url.Host;
        if (host.Split('.').Length > 2)
        {
            int lastIndex = host.LastIndexOf(".");
            int index = host.LastIndexOf(".", lastIndex - 1);
            return host.Substring(0, index);
        }
    }         
    return null;     
}

这让我感觉像

foo.bar.car

。我想要一个foo.bar。我是否应该使用拆分并取0和1

但也有可能是WW

有什么简单的方法吗？

我建议使用正则表达式。下面的代码段应该提取您要查找的内容

string input = "foo.bar.car.com.au";
var match = Regex.Match(input, @"^\w*\.\w*\.\w*");
var output = match.Value;

好的，首先。您是专门在“com.au”中查找，还是这些是通用的Internet域名？因为如果是后者，根本没有自动的方法来确定域中有多少是“站点”或“区域”或其他内容，以及有多少是该区域内的单个“主机”或其他记录

如果您需要能够从任意域名中找出这一点，那么您需要从Mozilla公共后缀项目（）中获取TLD列表，并使用它们的算法在您的域名中查找TLD。然后，您可以假设您想要的部分以TLD之前的最后一个标签结束

根据您的要求（您需要前两个级别，不包括“www.”），我会这样处理：

private static string GetSubDomain(Uri url)
{

    if (url.HostNameType == UriHostNameType.Dns)
    {

        string host = url.Host;

        var nodes = host.Split('.');
        int startNode = 0;
        if(nodes[0] == "www") startNode = 1;

        return string.Format("{0}.{1}", nodes[startNode], nodes[startNode + 1]);

    }

    return null; 
}

我遇到了一个类似的问题，并根据前面的答案编写了这个扩展方法。最重要的是，它采用一个定义“根”域的参数，即方法的使用者认为是根的任何内容。在OP的情况下，呼叫将是

Uri uri = "foo.bar.car.com.au";
uri.DnsSafeHost.GetSubdomain("car.com.au"); // returns foo.bar
uri.DnsSafeHost.GetSubdomain(); // returns foo.bar.car

以下是扩展方法：

/// <summary>Gets the subdomain portion of a url, given a known "root" domain</summary>
public static string GetSubdomain(this string url, string domain = null)
{
  var subdomain = url;
  if(subdomain != null)
  {
    if(domain == null)
    {
      // Since we were not provided with a known domain, assume that second-to-last period divides the subdomain from the domain.
      var nodes = url.Split('.');
      var lastNodeIndex = nodes.Length - 1;
      if(lastNodeIndex > 0)
        domain = nodes[lastNodeIndex-1] + "." + nodes[lastNodeIndex];
    }

    // Verify that what we think is the domain is truly the ending of the hostname... otherwise we're hooped.
    if (!subdomain.EndsWith(domain))
      throw new ArgumentException("Site was not loaded from the expected domain");

    // Quash the domain portion, which should leave us with the subdomain and a trailing dot IF there is a subdomain.
    subdomain = subdomain.Replace(domain, "");
    // Check if we have anything left.  If we don't, there was no subdomain, the request was directly to the root domain:
    if (string.IsNullOrWhiteSpace(subdomain))
      return null;

    // Quash any trailing periods
    subdomain = subdomain.TrimEnd(new[] {'.'});
  }

  return subdomain;
}

///获取给定已知“根”域的url的子域部分
公共静态字符串GetSubdomain（此字符串url，字符串域=null）
{
var subdomain=url；
if（子域！=null）
{
如果（域==null）
{
//由于我们没有提供已知的域，所以假设第二个到最后一个周期将子域与域分开。
var nodes=url.Split（'.'）；
var lastNodeIndex=节点。长度-1；
如果（lastNodeIndex>0）
域=节点[lastNodeIndex-1]+“+”节点[lastNodeIndex]；
}
//验证我们所认为的域是否真的是主机名的结尾……否则我们就被套住了。
如果（！subdomain.EndsWith（域））
抛出新ArgumentException（“站点未从预期域加载”）；
//取消域部分，这将给我们留下子域和一个尾随点（如果有子域）。
子域=子域。替换（域“”）；
//检查是否还有剩余内容。如果没有，则没有子域，请求直接发送到根域：
if（string.IsNullOrWhiteSpace（子域））
返回null；
//取消任何后续周期
subdomain=subdomain.TrimEnd（新[]{.'.}）；
}
返回子域；
}

您可以使用以下nuget软件包。它使用来自Mozilla的消息来分割域

PM> Install-Package Nager.PublicSuffix

示例

 var domainParser = new DomainParser();
 var data = await domainParser.LoadDataAsync();
 var tldRules = domainParser.ParseRules(data);
 domainParser.AddRules(tldRules);

 var domainName = domainParser.Get("sub.test.co.uk");
 //domainName.Domain = "test";
 //domainName.Hostname = "sub.test.co.uk";
 //domainName.RegistrableDomain = "test.co.uk";
 //domainName.SubDomain = "sub";
 //domainName.TLD = "co.uk";

除了中指定的NuGet软件包外，还有NuGet软件包，根据其定义，它是一个.Net核心库，可解析，并基于，但有以下更改：

移植到.NET核心库
修复了库，因此它通过了所有综合测试
重构类，将功能划分为更小的重点类
使类不可变。因此，
```
DomainParser
```
可以作为单例使用，并且是线程安全的
添加了
```
WebTldRuleProvider
```
和
```
FileTldRuleProvider
```
增加了了解该规则是ICANN还是私有域规则的功能
使用异步编程模型

该页面还声明上述许多更改已提交回原始版本。

首先，让我们保持语言整洁。但是第二，忘记了代码，你怎么知道

foo.bar

是

foo.bar.car.com.au

中的子域？对不起，这是一种语言。这一定是美国的事。“我知道，因为我正在开发它。问题不在于你怎么知道，@DarthVader，知道。”。问题是，代码可以假设什么？请看下面我的答案。从“健壮代码”的角度来看，这比我的答案要正确得多。如果您知道您总是只需要前两个级别，那么我的将起作用（作为一个起点），但一般来说这会更好。如果您在编写扩展方法时遇到麻烦，为什么不扩展

System.Uri

而不是字符串？然后它有一个适当的“URL”相关上下文。@TrueBlueAussie确实如此。我扩展字符串仅仅是因为这是我的代码库的MVP。正如您所建议的，未来的重构可能会导致扩展System.Uri以维护上下文。这是非常荒谬的，因为这是将域/主机名分解为其各个部分的唯一方法—拥有所有TLD的完整列表，然后拆分其余部分。

使用System.Text.RegularExpressions
 var domainParser = new DomainParser();
 var data = await domainParser.LoadDataAsync();
 var tldRules = domainParser.ParseRules(data);
 domainParser.AddRules(tldRules);

 var domainName = domainParser.Get("sub.test.co.uk");
 //domainName.Domain = "test";
 //domainName.Hostname = "sub.test.co.uk";
 //domainName.RegistrableDomain = "test.co.uk";
 //domainName.SubDomain = "sub";
 //domainName.TLD = "co.uk";