使用Excel修剪根/子域的URL

使用Excel修剪根/子域的URL,excel,url,excel-formula,trim,Excel,Url,Excel Formula,Trim,我需要将Microsoft Excel中的URL修剪到根域和子域 A1=包含https://blog.example.com/page/ B1=应导致example.com C1=应导致blog.example.com 删除http、https、.www和PATH的两个公式。第一个版本(B1)也应该删除子域 我现在只有一个公式: =MID(替换(A2;“www.”);搜索(“:”A2)+3;搜索(“/”替换(A2;“www.”;”);9)-搜索(“:”A2)-3) https://example

我需要将Microsoft Excel中的URL修剪到根域和子域

A1=包含
https://blog.example.com/page/

B1=应导致
example.com

C1=应导致
blog.example.com

删除http、https、.www和PATH的两个公式。第一个版本(B1)也应该删除子域

我现在只有一个公式:

=MID(替换(A2;“www.”);搜索(“:”A2)+3;搜索(“/”替换(A2;“www.”;”);9)-搜索(“:”A2)-3)

https://example.com/page/page
导致
example.com

http://www.example.com/page/page
导致
example.com

http://blog.example.com/page/
导致
blog.example.com

example.com/page
产生
#值
www.example.com/page
产生
#值

正如您在上面的示例中看到的,我得到了很好的结果。但是没有http或https它就无法工作。这个版本还保留了子域

试试B1

=SUBSTITUTE(TRIM(RIGHT(SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), CHAR(46), REPT(CHAR(32), LEN(A1))), LEN(A1)*2)), CHAR(32), CHAR(46))
。。。。这在C1中

=SUBSTITUTE(REPLACE(REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/", FIND("/", REPLACE(A1, 1, IFERROR(FIND("//", A1)+1, 0), TEXT(,))&"/"), LEN(A1), TEXT(,)), "www.", TEXT(,))

子域-是的,但我添加了对空白单元格的支持,因为原始版本输出了“/”:

-支持国际域的版本(例如this.co.uk)。但与Jeeped的版本不同,它不支持像www.this.cotest这样的单字TLD。this.co-有人知道如何解决这个问题吗?目前,我至少为“www”使用了一个helper行:

它致力于:

            A                   |           B           |       C
(blank)                         |   ""                  |   ""                          
blog.test.com                   |   blog.test.com       |   test.com
http://blog.test.com            |   blog.test.com       |   test.com
test.com                        |   test.com            |   test.com
http://test.com                 |   test.com            |   test.com
https://test.com                |   test.com            |   test.com
www.test.com                    |   test.com            |   test.com
http://www.test.com             |   test.com            |   test.com
https://www.test.com            |   test.com            |   test.com
test.co.uk                      |   test.co.uk          |   test.co.uk
http://test.co.uk               |   test.co.uk          |   test.co.uk
https://test.co.uk              |   test.co.uk          |   test.co.uk
www.test.co.uk                  |   test.co.uk          |   test.co.uk
http://www.test.co.uk           |   test.co.uk          |   test.co.uk
https://www.test.co.uk          |   test.co.uk          |   test.co.uk
example.test.co.uk              |   example.test.co.uk  |   test.co.uk
http://example.test.co.uk       |   example.test.co.uk  |   test.co.uk
https://example.test.co.uk      |   example.test.co.uk  |   test.co.uk
example.com/test                |   example.com         |   example.com
http://example.com/test         |   example.com         |   example.com
https://example.com/test        |   example.com         |   example.com
http://blog.example.com/page/   |   blog.example.com    |   example.com
example.com/page                |   example.com         |   example.com
www.example.com/page            |   example.com         |   example.com

如果您的excel版本具有FILTERXML功能(可在
excel 365、excel 2019、excel 2016和excel 2013中找到该功能)

假设您的URL位于范围
A2:A29

要查找子域,请在单元格
B2
中输入以下公式并将其向下拖动:

=SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(IFERROR(MID(A2,FIND("//",A2)+2,LEN(A2)),A2),"/","</s><s>")&"</s></t>","t/s[1]"),"www.","")
=IF((SUMPRODUCT(--(MID(B2,ROW($1:$100),1)="."))-IF(SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))=3,2,SUMPRODUCT(--(MID(RIGHT(B2,8),ROW($1:$8),1)="."))))>0,RIGHT(B2,LEN(B2)-FIND(".",B2)),B2)
我使用第一个公式中的子域来查找根域。诀窍是找出第一个点
之前的URL组件是根域还是子域,并采取相应的措施

样本数据

| URL                              | Sub                 | Root           |
|----------------------------------|---------------------|----------------|
| https://example.com/page/page    | example.com         | example.com    |
| http://www.example.com/page/page | example.com         | example.com    |
| http://blog.example.com/page/    | blog.example.com    | example.com    |
| example.com/page                 | example.com         | example.com    |
| www.example.com/page             | example.com         | example.com    |
| blog.test.com                    | blog.test.com       | test.com       |
| http://blog.test.com             | blog.test.com       | test.com       |
| test.com                         | test.com            | test.com       |
| http://blog.test.uk.net/         | blog.test.uk.net    | test.uk.net    |
| https://test.cn                  | test.cn             | test.cn        |
| www.test.com                     | test.com            | test.com       |
| http://www.test.com              | test.com            | test.com       |
| https://www.test.com             | test.com            | test.com       |
| test.co.uk                       | test.co.uk          | test.co.uk     |
| https://test.co.uk               | test.co.uk          | test.co.uk     |
| www.test.co.uk                   | test.co.uk          | test.co.uk     |
| http://www.test.co.uk            | test.co.uk          | test.co.uk     |
| https://www.test.co.uk           | test.co.uk          | test.co.uk     |
| blog.123.firm.in                 | blog.123.firm.in    | 123.firm.in    |
| http://example.test.co.uk        | example.test.co.uk  | test.co.uk     |
| https://test.7.org.au            | test.7.org.au       | 7.org.au       |
| test.example.org.nz/page         | test.example.org.nz | example.org.nz |
| http://example.com/test          | example.com         | example.com    |
| https://example.com/test         | example.com         | example.com    |
| http://blog.example.com/page/    | blog.example.com    | example.com    |
| example.com/page                 | example.com         | example.com    |
| www.example.com/page             | example.com         | example.com    |
| http://blog.1.co.uk              | blog.1.co.uk        | 1.co.uk        |
对于B1(提取根域),如果A1是完整的URL:

=SUBSTITUTE(SUBSTITUTE(REPLACE(A1,1,FIND(".",$A1),""),REPLACE(REPLACE(A1,1,FIND(".",$A1),""),1,FIND("/",REPLACE(A1,1,FIND(".",$A1),"")),""),""),"/","")

完美无瑕!这真的很有效。唯一的问题是,对于像www.dailymail.co.uk这样的linkw,它将其更改为co.uk,在那里它应该是dailymail.co.uk。这是最好的答案。您能详细说明静态部件行($1:$100)和行($1:$8)吗?我假设$1:$100代表$header title:$lastrow。但为什么是1:8?
| URL                              | Sub                 | Root           |
|----------------------------------|---------------------|----------------|
| https://example.com/page/page    | example.com         | example.com    |
| http://www.example.com/page/page | example.com         | example.com    |
| http://blog.example.com/page/    | blog.example.com    | example.com    |
| example.com/page                 | example.com         | example.com    |
| www.example.com/page             | example.com         | example.com    |
| blog.test.com                    | blog.test.com       | test.com       |
| http://blog.test.com             | blog.test.com       | test.com       |
| test.com                         | test.com            | test.com       |
| http://blog.test.uk.net/         | blog.test.uk.net    | test.uk.net    |
| https://test.cn                  | test.cn             | test.cn        |
| www.test.com                     | test.com            | test.com       |
| http://www.test.com              | test.com            | test.com       |
| https://www.test.com             | test.com            | test.com       |
| test.co.uk                       | test.co.uk          | test.co.uk     |
| https://test.co.uk               | test.co.uk          | test.co.uk     |
| www.test.co.uk                   | test.co.uk          | test.co.uk     |
| http://www.test.co.uk            | test.co.uk          | test.co.uk     |
| https://www.test.co.uk           | test.co.uk          | test.co.uk     |
| blog.123.firm.in                 | blog.123.firm.in    | 123.firm.in    |
| http://example.test.co.uk        | example.test.co.uk  | test.co.uk     |
| https://test.7.org.au            | test.7.org.au       | 7.org.au       |
| test.example.org.nz/page         | test.example.org.nz | example.org.nz |
| http://example.com/test          | example.com         | example.com    |
| https://example.com/test         | example.com         | example.com    |
| http://blog.example.com/page/    | blog.example.com    | example.com    |
| example.com/page                 | example.com         | example.com    |
| www.example.com/page             | example.com         | example.com    |
| http://blog.1.co.uk              | blog.1.co.uk        | 1.co.uk        |
=SUBSTITUTE(SUBSTITUTE(REPLACE(A1,1,FIND(".",$A1),""),REPLACE(REPLACE(A1,1,FIND(".",$A1),""),1,FIND("/",REPLACE(A1,1,FIND(".",$A1),"")),""),""),"/","")