Php 站点地图编码的困境_Php_Xml_Encoding_Sitemap

Php 站点地图编码的困境

php xml encoding

Php 站点地图编码的困境,php,xml,encoding,sitemap,Php,Xml,Encoding,Sitemap,我真的很难理解关于如何正确转义和编码URL以便在站点地图中提交的规范和指南在（实体转义）示例中，它们有一个示例URL： http://www.example.com/ümlat.php&q=name 当UTF-8编码的结果为（根据它们）：然而，当我在PHP上尝试这个（rawurlencode）时，我最终得到了： http%3A%2F%2Fwww.example.com%2F%C3%BCmlat.php%26q%3Dname 通过使用上的函数，我已经克服了这一点但据我采访过的某个

我真的很难理解关于如何正确转义和编码URL以便在站点地图中提交的规范和指南

在（实体转义）示例中，它们有一个示例URL：

http://www.example.com/ümlat.php&q=name

当UTF-8编码的结果为（根据它们）：

然而，当我在PHP上尝试这个（rawurlencode）时，我最终得到了：

http%3A%2F%2Fwww.example.com%2F%C3%BCmlat.php%26q%3Dname

通过使用上的函数，我已经克服了这一点

但据我采访过的某个人（科哈纳BDFM）说，这种解释是错误的。老实说，我很困惑，甚至不知道什么是对的

在站点地图中使用URL的正确编码方式是什么

相关的

问题在于

http://www.example.com/ümlat.php&q=name

不是有效的url

（来源：，这是过时的，但在这里起作用，RFC3986确实允许更多字符，但转义不需要转义的字符不会造成任何伤害）

httpurl=“http://”主机端口[“/”hpath[“？”搜索]] hpath=hsegment*[“/”hsegment] H段=*[uchar |“；”：“|“@”|“&”|“=”] uchar=无保留的|逃逸无保留=α|数字|安全|额外 safe=“$”|“-“|”|“|”|“+” extra=“！”|“*”|“””|“（“|”）|“，” escape=“%”十六进制搜索=*[uchar |“；”|“：“|“@”|“&”|“=”] 所以除了

；：@&=$-.+！*”之外的任何字符（），

，必须转义

0-9a-zA-Z

字符或转义序列（例如

%A0

或等效的

%A0

）。

？

字符最多只能出现一次。

字符可以出现在路径部分，但不能出现在查询字符串中。编码其他字符的约定是计算其UTF-8表示并转义该序列

您的算法应该（假设主机部件不是问题…）：

提取路径部分
提取查询字符串部分
对于其中的每一个，请查找无效字符
用UTF-8编码这些字符
将结果传递给
```
rawurlencode
```
将URL中的字符替换为
```
rawurlencode
```

http%3A%2F%2Fwww.example.com%2F%C3%BCmlat.php%26q%3Dname

$entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', 
    '%26', '%3D', '%2B', '%24', '%2C', '%2F', '%3F', '%23', '%5B', '%5D');

$replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "+",
    "$", ",", "/", "?", "#", "[", "]");

$string = str_replace($entities, $replacements, rawurlencode($string));

httpurl = "http://" hostport [ "/" hpath [ "?" search ]] hpath = hsegment *[ "/" hsegment ] hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ] uchar = unreserved | escape unreserved = alpha | digit | safe | extra safe = "$" | "-" | "_" | "." | "+" extra = "!" | "*" | "'" | "(" | ")" | "," escape = "%" hex hex search = *[ uchar | ";" | ":" | "@" | "&" | "=" ]