Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/395.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java regex\\p{So}不过滤黑圈中的记录_Java_Regex - Fatal编程技术网

Java regex\\p{So}不过滤黑圈中的记录

Java regex\\p{So}不过滤黑圈中的记录,java,regex,Java,Regex,我的问题是关于Unicode字符“黑圈记录” 在其他[So]类别的杂项技术块和符号中定义 此代码不起作用 String registered= "President⏺"; System.out.println(registered.replaceAll("\\p{So}","")); 我得到了总统⏺ 记录的黑圈未被\\p{So}regex过滤 感谢您了解⏺是23FA并列出\p{So}(其他_符号)类别下的所有字符: for (char ch = Character.MIN_VAL

我的问题是关于Unicode字符“黑圈记录” 在其他[So]类别的杂项技术块和符号中定义

此代码不起作用

String registered= "President⏺";   
System.out.println(registered.replaceAll("\\p{So}",""));  
我得到了总统⏺

记录的黑圈未被
\\p{So}
regex过滤


感谢您了解
23FA
并列出
\p{So}
其他_符号
)类别下的所有字符:

for (char ch = Character.MIN_VALUE; ch<Character.MAX_VALUE; ch++) {
    if (Character.OTHER_SYMBOL == Character.getType(ch)) {
        String s = String.format ("\\u%04x", (int)ch);
        System.out.println(s);
    }
}
很明显,代码点
\u23f3
\u23ff
不包括在内,但它们应符合以下要求。您可以将黑色圆圈与Java中的
\p{InMiscellaneous\u Technical}
块中正确落下的记录进行匹配


您看到了一个bug。

我编写了这段代码来计算字符unicode类别名称

Map<Byte, String> unicodeCategories = new HashMap<>();
unicodeCategories.put(Character.COMBINING_SPACING_MARK, "Mc");
unicodeCategories.put(Character.CONNECTOR_PUNCTUATION, "Pc");
unicodeCategories.put(Character.CONTROL, "Cc");
unicodeCategories.put(Character.CURRENCY_SYMBOL, "Sc");
unicodeCategories.put(Character.DASH_PUNCTUATION, "Pd");
unicodeCategories.put(Character.DECIMAL_DIGIT_NUMBER, "Nd");
unicodeCategories.put(Character.ENCLOSING_MARK, "Me");
unicodeCategories.put(Character.END_PUNCTUATION, "Pe");
unicodeCategories.put(Character.FINAL_QUOTE_PUNCTUATION, "Pf");
unicodeCategories.put(Character.FORMAT, "Cf");
unicodeCategories.put(Character.INITIAL_QUOTE_PUNCTUATION, "Pi");
unicodeCategories.put(Character.LETTER_NUMBER, "Nl");
unicodeCategories.put(Character.LINE_SEPARATOR, "Zl");
unicodeCategories.put(Character.LOWERCASE_LETTER, "Ll");
unicodeCategories.put(Character.MATH_SYMBOL, "Sm");
unicodeCategories.put(Character.MODIFIER_LETTER, "Lm");
unicodeCategories.put(Character.MODIFIER_SYMBOL, "Sk");
unicodeCategories.put(Character.NON_SPACING_MARK, "Mn");
unicodeCategories.put(Character.OTHER_LETTER, "Lo");
unicodeCategories.put(Character.OTHER_NUMBER, "No");
unicodeCategories.put(Character.OTHER_PUNCTUATION, "Po");
unicodeCategories.put(Character.OTHER_SYMBOL, "So");
unicodeCategories.put(Character.PARAGRAPH_SEPARATOR, "Zp");
unicodeCategories.put(Character.PRIVATE_USE, "Co");
unicodeCategories.put(Character.SPACE_SEPARATOR, "Zs");
unicodeCategories.put(Character.START_PUNCTUATION, "Ps");
unicodeCategories.put(Character.SURROGATE, "Cs");
unicodeCategories.put(Character.TITLECASE_LETTER, "Lt");
unicodeCategories.put(Character.UNASSIGNED, "Cn");
unicodeCategories.put(Character.UPPERCASE_LETTER, "Lu");

char registered = '⏺';
int code = (int) registered;
System.out.println("character's general category name = 
"+unicodeCategories.get( (byte) (Character.getType(code) ) ));
我得到空字符串

那么,问题就来了⏺' 在java实现中属于Character.UNASSIGNED类别而不是Character.OTHER_符号类别

System.out.println("Unicode name of the specified character = 
"+Character.getName(code)); retun null because the code point is unassigned

我试图弄清楚Java是如何解释类的,但是现在我将把它称为bug。@ctwheels如果它是Java中的bug,那么为什么它在php中失败?我认为regex101有问题。com@AniketSahrawat它在PCRE@ctwheels中匹配,请尝试在apache服务器上运行。Java编译器也是如此。所以我认为这不是php或Java的问题。我强烈感觉问题出在regex101.com本身。@AniketSahrawat对于PHP preg_*函数,您必须设置
u
flag.Fun<代码>\p{InMiscellaneous_Technical}
在Java中工作(仅)
\p{So}
在.NET、Ruby、Go中工作,在JavaScript、Python、Perl等语言中都失败了。不仅仅是在Java中。当然,您可以找到更多的信息,这只是我的(非详尽的)调查结果的一个报告,其中有特定的代码点在手。
System.out.println(registered.replaceAll("\\p{Cn}",""));
System.out.println("Unicode name of the specified character = 
"+Character.getName(code)); retun null because the code point is unassigned