Html 如果<；基本href…>；是否设置了双斜杠？_Html_Html Head

Html 如果<；基本href…>；是否设置了双斜杠？

html

Html 如果<；基本href…>；是否设置了双斜杠？,html,html-head,Html,Html Head,我喜欢了解如何为我的网络爬虫使用值，所以我测试了几种主要浏览器的组合，最后发现了一些我不懂的双斜杠如果您不喜欢阅读所有内容，请跳转到D和E的测试结果。所有测试的演示：调用http://example.com/images.html： A-多基地href <html> <head> <base target="_blank" /> <base href="http://example.com/images/" /> <base href=

我喜欢了解如何为我的网络爬虫使用

值，所以我测试了几种主要浏览器的组合，最后发现了一些我不懂的双斜杠

如果您不喜欢阅读所有内容，请跳转到D和E的测试结果。所有测试的演示：

调用

http://example.com/images.html

：

A-多基地href

<html>
<head>
<base target="_blank" />
<base href="http://example.com/images/" />
<base href="http://example.com/" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg">
<img src="./image.jpg">
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>

结论

忽略最后一个斜杠后的所有内容，因此
```
http://example.com/images
```
变为
```
http://example.com/
```

C-它应该是什么样子的

<html>
<head>
<base href="http://example.com/" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg">
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>

<html>
<head>
<base href="http://example.com/images/ /" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg">
</body>
</html>


找不到
找不到
找不到
找不到

结论

与测试B中的结果相同

D-双斜杠

<html> <head> <base href="http://example.com/images" /> </head> <body> <img src="/images/image.jpg"> <img src="image.jpg"> not found <img src="./image.jpg"> not found <img src="images/image.jpg"> <img src="/image.jpg"> not found <img src="../image.jpg"> not found </body> </html>

<html> <head> <base href="http://example.com/images//" /> </head> <body> <img src="/images/image.jpg"> <img src="image.jpg"> <img src="./image.jpg"> <img src="images/image.jpg"> not found <img src="/image.jpg"> not found <img src="../image.jpg"> </body> </html>

找不到找不到
E-带空格的双斜杠

<html> <head> <base href="http://example.com/" /> </head> <body> <img src="/images/image.jpg"> <img src="image.jpg"> not found <img src="./image.jpg"> not found <img src="images/image.jpg"> <img src="/image.jpg"> not found <img src="../image.jpg"> not found </body> </html>

<html> <head> <base href="http://example.com/images/ /" /> </head> <body> <img src="/images/image.jpg"> <img src="image.jpg"> not found <img src="./image.jpg"> not found <img src="images/image.jpg"> not found <img src="/image.jpg"> not found <img src="../image.jpg"> </body> </html>

找不到找不到找不到找不到
两者都不是“有效”的URL，而是我的网络爬虫的真实结果。请解释在D和E中可以找到
./image.jpg
的地方发生了什么，以及为什么会导致空格出现差异
仅出于您的兴趣：

与测试C

完全不同。仅找到
。/image.jpg

仅查找
/images/image.jpg

HTML规范中解释了的行为：
元素允许作者出于以下目的指定
如测试A所示，如果有多个
base
带有
href
，则将是第一个
是这样做的：
将应用于url，以base作为基本url，以encoding作为编码
算法在URL规范中定义
这太复杂了，这里无法详细解释。但基本上，情况就是这样：

以
/
开头的相对URL是相对于基本URL的主机计算的

否则，将根据基本URL的最后一个目录计算相对URL

请注意，如果基本路径不是以
/
结尾，则最后一部分将是文件，而不是目录

/
是当前目录

。/
向上移动一个目录

（在URL中，“目录”和“文件”可能不是正确的术语）
一些例子：

http://example.com/images/a/./
is
http://example.com/images/a/

http://example.com/images/a/../
is
http://example.com/images/

http://example.com/images//./
is
http://example.com/images//

http://example.com/images//../
is
http://example.com/images/

http://example.com/images/./
is
http://example.com/images/

http://example.com/images/../
is
http://example.com/

请注意，在大多数情况下，
/
类似于
/
。作为
除非您正在使用某种URL重写（在这种情况下重写规则可能受斜杠数（uri）的影响映射到磁盘上的路径，但在（大多数？）现代操作系统中（Linux/Unix、Windows），行中的多路径分隔符没有任何特殊含义，所以/path/to/foo和/path///to///foo都会最终映射到同一个文件
但是，一般来说，
/
不会变成
/
您可以使用以下代码段将相对URL列表解析为绝对URL：

var基=[ "http://example.com/images/", "http://example.com/images", "http://example.com/", "http://example.com/images//", "http://example.com/images/ /" ]; 变量URL=[ “/images/image.jpg”， “image.jpg”， “/image.jpg”， “images/image.jpg”， “/image.jpg”， “./image.jpg” ]; 函数newEl（类型、内容）{ var el=document.createElement（类型）；如果（！contents）返回el； if（！（数组的内容实例））内容=[内容]；对于（var i=0；iI意味着您可以简单地尝试一下。为什么不这样做？@panther我当然看过。您没有阅读说明。请删除您的否决票并关闭请求。请参阅。根据定义，您应该只使用单斜杠…双斜杠将根据服务器设置进行不同的处理。@pschueller如果多个斜杠将充当一个斜杠（如链接中所述）测试D和E的结果与A相同。@pschueller我知道我的示例“无效”，但这是通过我的爬虫程序得到的一个真实示例，因此我想知道如何在这种特殊情况下使用base href。因此，最后我不受html源代码的影响。谢谢！现在我了解了发生的情况。在测试中，D/ 比图像/ 更深一层，所以。/ 目标图像/ 。但是如果目标包含与/ 相同的/image.jpg 第二个被忽略。为了澄清问题，我将编辑您的答案。请随意使用。