Javascript 在JS中修剪非打印unicode

Javascript 在JS中修剪非打印unicode,javascript,unicode,Javascript,Unicode,如何删除多语言输入中不可打印的unicode字符 当具有不同本地化的用户粘贴字符串时,他们有时会无意中嵌入非打印字符。例如: var weird = "%E2%80%AA%E2%80%8ETest%E2%80%AC" var displaysAs = decodeURI(weird); // Users see only "Test" 但我不知道如何在不影响其他语言的情况下剥离非打印字符,如: encodeURI("شنط") = "%D8%B4%D9%86%D8%B7" encodeURI(

如何删除多语言输入中不可打印的unicode字符

当具有不同本地化的用户粘贴字符串时,他们有时会无意中嵌入非打印字符。例如:

var weird = "%E2%80%AA%E2%80%8ETest%E2%80%AC"
var displaysAs = decodeURI(weird); // Users see only "Test"
但我不知道如何在不影响其他语言的情况下剥离非打印字符,如:

encodeURI("شنط") = "%D8%B4%D9%86%D8%B7"
encodeURI("戦艦帝国") = "%E6%88%A6%E8%89%A6%E5%B8%9D%E5%9B%BD"
例如,以下修复上述奇怪示例的尝试无效:

var=“%E2%80%AA%E2%80%8ETest%E2%80%AC”;
var displaysAs=decodeURI(怪异);
var stillwird=encodeURI(displaysAs.replace(/\s/g,“”));
//值再次为“%E2%80%AA%E2%80%8ETest%E2%80%AC”
log('before:',怪异);
log('after:',displaysAs);
log('再次:',仍然奇怪)

。作为控制台包装{min height:100%}
您可以使用中的正则表达式

问题中提供的三个字符串的数组示例:

let-weird=[
“%E2%80%AA%E2%80%8ETest%E2%80%AC”,
“%D8%B4%D9%86%D8%B7”,
%E6%88%A6%E8%89%A6%E5%B8%9D%E5%9B%BD
];
常量表达式=/[0 0-\0 0 0-\0 0 0-\0 0 0-\0 0 0-\0 0-\0 0 0-\0 0 0 0 0 0 0-\0 0 0 0 0 0 0-\0 0 0 0 0 0 0 0-\0 0 0 0 0-\0 0 0 0 0 0-\0 0 0 0 0-\0 0 0 0 0-\0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 085D\u085F-\u089F\u08A1\u08AD-\u08E3\u08FF\u0978\u0980\u0984\u098D\u098E\u0991\u0992\u09A9\u09B1\U093-3-U093-U093-U093-U093-U093-U093-U093-U093-U093-U096-U096-U096-U096-8-U093-U093-U093-u09B5-u09B5-u09B5-3-u09B5-3-u09B5-3-U093-3-U093-3-U093-u09B3-u09B3-u09B3-u09B5-u09B5-4-u09B5-4-u09B5-u09B5\u09B5-u09B5-u09B5-u09B5-B5-UUUU09B5-B5-4\UUUUUU09B5-B5-B5-B5-B5-4-UUU09B5-B5-UUUUUUUU\ u0A5F-\u0A65\u0A76-\u0A80\u0A84\u0A8E\u0A92\u0AA9\u0AB1\u0AB4\u0ABA\u0ABB\u0AC6\u0ACA\u0ACE\u0ACF\u0AD1-\u0ADF\uu0BA2\u0BA5-\u0BA7\u0BAB-\u0BAD\u0BBA-\u0BBD\u0BC3-\u0BC5\u0BC9\u0BCE\u0BCF\u0BD1-\u0BD6\u0BD8-\u0BE5\u0BFB-\u0C00\u0C04\u0C0D\10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 0D3C\u0D45\u0D49\u0D4F-\u0D56\u0D58-\u0D5F\u0D64\u0D65\u0D76-\u0D78\u0D80\u0D81\u0D84\u0D97-\u010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F\u0EC5\u0EC7\u0ECE\u0ECF\u0EDA\u0EDB\u0EE0-\u0EFF\u0F48\u0F6D-\u0F70\u0F98\u0FBD\u0FCD\u0FDB-\u0FFF\u10C6\u10C8-\u10CC\u10C10月10日F\u10CF\u1249\u1249\u124F\u124F\u1257\u1259\u125F\u1289\u1289\u1289\u1289\u1289\U10 10 10 10 10 F\U10F\U1010F\U101010F\U1010\U10101010\U10\U10\U10 10\U10\U10\U10\U10\U10\u1289\U10\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\u1289\UUU1289\UUU1289\U1288\UUU1774-\u177F\u17DE\u17DF\u17EA-\u17EF\u17FA-\u17FF\u180F\u181A-\u181F\u1878-\u187本次研究的结果是:美国联邦政府的18只抗体\U18 18只抗体\U18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18现现现现现现现现现市市价18 18 18 18 18 18 18 18 18 18 18 18 18现现现市价\U18 18 18 18 18 18 18 18现现市价\u193C-\u193F\U19 19 19 19 19 19 19现现市价\u193C-\u193F\u193F\U19 19 19 19 19 19 19 19 19 19 19 19 19 19 19现现现现市价\u193F\U19 19 19 19 19 19 19 19 19 19 19 19 19 19现现现现现现现现现现市价\u193C-\U19 19 19 19 19 19 19 19 19现现现现现现现现现现现现现现现现现市价--U19 19 19 19 19 19 19-\u1C3A\u1C4A-\u1C4C\u1C80-\u1CBF\u1CC8-\u1CCF\u1CF7-\u1CFF\u1DE7-\u1DFB\u1F16\u1F17\u1F1E\u1F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F u245F\u2700\u2B4D-\u2B4F\u2B5A-\u2BFF\u2C2F\u2C5F\u2CF4-\u2CF8\u2D26\u2D28-\u2D2C\u2D2E\U2-2-U2-U2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-uA48D-\uA48F\uA4C7-\uA4CF\uA62C-\uA63F\uA698-\uA69E\uA6F8-\uA6FF\uA78F\uA794-\uA79F\UA77AB-\ua77AAA7-\ua77AAAAA7 7 7 7 0 0 0 0 0 0 7 7 7 7 7 7 0 0 0 0 0 8 8 8 8 8 8 8 8 A8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 AB-\UA77 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 7 7 7 7 7-\UA877 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7试图试图试图试图试图试图试图试图8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 7 7 7 7 7 7 7 7 7-\uABBF\uABEE\uABEF\uABFA-\uABFF\uD7A4-\uD7AF\uD7C7-\uD7CA\uD7FC-\uF8FF\uFA6UFB16\uFB37\uFB37\uFB37\uFB37\uFB37\uFB37\uFB37\UFB33\UFB33\UFB34\UFB4 4\UFB4 4 4\UFB4 4 4\UFB4 4 4\UFF6\UF6\UF6\UF6\UF6\UFF\\uf6\UF6\uf6\UF10\ff6\UF6\UF6\UFB6\f7\UFB4 4\UFB4\UFB4 4 4\UFB6\UFB6\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB6\UFB4\UFB4\UFB4\UFB4\UFB6\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UFB4\UF6 f-\uFFFB\uFFFE\uFFFF]/g;
怪异的.map(decodeURI).forEach(el=>{
让修剪=el.replace(表达式“”)
控制台.log(修剪,修剪.长度);

})
这似乎适用于我所有的测试用例。但是有没有任何规范建议它应该通用并继续使用?因为这是一个非常神奇的字符串!我个人没有验证正则表达式,但它应该与Unicode 8.0中的字符相匹配:。至于它是一个“神奇的字符串”,这就是为什么我把它赋给一个常量,如果你愿意,你可以更进一步,称它为
const NON_PRINTING_REGEXP
,因为它不那么神奇。FWIW从链接答案中的正则表达式,它又是从中的General_Category Values表中构造的。“C”类别包括Cc(C0或C1控制代码)、Cf(格式控制字符)、Cs(代理代码点)、Co(专用字符)和Cn(保留的未分配代码点或非字符)。定义“不可打印”。准确的定义将有效地构成对该问题的回答,因为剩下的只是将其表示为正则表达式或