Ruby CGI.unescapethtml生成奇怪的字符

Ruby CGI.unescapethtml生成奇怪的字符,ruby,cgi,html-encode,Ruby,Cgi,Html Encode,我已经将一堆降价格式的注释备份到一个XML文档中。这当然意味着我需要对它们进行观察。当我尝试使用CGI.unescapeHTML时,它会在标记中添加一些奇怪的字符,这些字符在所有浏览器中都不能很好地呈现 具体来说,它将两个空格替换为“\302\240”,但并不一致。我如何让它停止这种行为 例如: s=“我看到越来越多的a href=”http://github.com/aslakhellesoy/cucumber /tree/master“cumber/a用法。这是一件好事!但我也看到一些人没有

我已经将一堆降价格式的注释备份到一个XML文档中。这当然意味着我需要对它们进行观察。当我尝试使用CGI.unescapeHTML时,它会在标记中添加一些奇怪的字符,这些字符在所有浏览器中都不能很好地呈现

具体来说,它将两个空格替换为“\302\240”,但并不一致。我如何让它停止这种行为

例如:

s=“我看到越来越多的a href=”http://github.com/aslakhellesoy/cucumber /tree/master“cumber/a用法。这是一件好事!但我也看到一些人没有充分使用正则表达式。以下是一些快速正则表达式技巧,可以让您的功能更具可读性:
;
;*`(?:a | an)`--使用this结构,您可以对不匹配的对象进行分组。我看到很多步骤都有未使用的参数,因为有人需要一个组,但不知道如何避免捕获它
“
CGI.unescapes
#=>“我看到越来越多的使用。\302\240这是一件好事!\302\240但我

这些是不间断的空格


你用的是什么版本?我没有在1.8.7上看到它。我发现这是由Haml添加 ;字符作为空格。它在这里工作,因为SO格式消除了这个问题。但要找到这个需要几个小时的工作。
s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage.  This is a good thing!  But I'm also seeing people who are not using regular expressions to their fullest.  Here are some quick regex tips to keep you features readable:

* `(?:a|an)` -- using a this construct you can group things wihout actually matching them.  I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it&#x000A"
CGI.unescapeHTML s
# => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\">Cucumber</a> usage.\302\240 This is a good thing!\302\240 But I'm..."
In computer-based text processing and digital typesetting, a
non-breaking space, also known as a no-break space or
non-breakable space (NBSP), is a variant of the space character
that prevents an automatic line break (line wrap) at its position.
In certain formats (such as HTML), it also prevents the
“collapsing” of multiple consecutive whitespace characters into a
single space. The non-breaking space is also known as a hard space
or fixed space. In Unicode, it is encoded as U+00A0 no-break space
(HTML: &#160; &nbsp;).