每个unicode块中的字符数

每个unicode块中的字符数,unicode,Unicode,有人知道显示每个Unicode块中字符数的引用吗?(在更新版本中,如5.x.x或6.0.0) 非常感谢。包含您感兴趣的数据 包含一些说明和用于解释数据的参考。您应该阅读该文档。包含您感兴趣的数据 包含一些说明和用于解释数据的参考。在该文档中,您应该阅读。有一个列表,尽管它没有具体说明适用于哪个版本的标准:有一个列表,尽管它没有具体说明适用于哪个版本的标准:unichars 这是否回答了您的问题: % unichars '\p{InCyrillic}' | wc -l 256 %

有人知道显示每个Unicode块中字符数的引用吗?(在更新版本中,如5.x.x或6.0.0)

非常感谢。

包含您感兴趣的数据

包含一些说明和用于解释数据的参考。您应该阅读该文档。

包含您感兴趣的数据

包含一些说明和用于解释数据的参考。在该文档中,您应该阅读。

有一个列表,尽管它没有具体说明适用于哪个版本的标准:

有一个列表,尽管它没有具体说明适用于哪个版本的标准:

unichars 这是否回答了您的问题:

% unichars '\p{InCyrillic}' | wc -l
     256    
% unichars '\p{InEthiopic}' | wc -l
     356
% unichars '\p{InLatin1}' | wc -l
 128
% unichars '\p{InCombiningDiacriticalMarks}' | wc -l
要包括16个星体层,请添加
-a
: 112 %unichars-a'\p{InAncientGreekNumbers}'| wc-l 七十五

如果您想要未分配或韩文或韩文,则需要
-u

% unichars -u '\p{InEthiopic}' | wc -l
     384    
% unichars -u '\p{InCJKUnifiedIdeographsExtensionA}' | wc -l
    6592

您还可以获得其他信息:

 % unichars '\P{IsGreek}' '\p{InGreek}' 
 ʹ   884 0374 GREEK NUMERAL SIGN
 ;   894 037E GREEK QUESTION MARK
 ΅   901 0385 GREEK DIALYTIKA TONOS
 ·   903 0387 GREEK ANO TELEIA
 Ϣ   994 03E2 COPTIC CAPITAL LETTER SHEI
 ϣ   995 03E3 COPTIC SMALL LETTER SHEI
 Ϥ   996 03E4 COPTIC CAPITAL LETTER FEI
 ϥ   997 03E5 COPTIC SMALL LETTER FEI
 Ϧ   998 03E6 COPTIC CAPITAL LETTER KHEI
 ϧ   999 03E7 COPTIC SMALL LETTER KHEI
 Ϩ  1000 03E8 COPTIC CAPITAL LETTER HORI
 ϩ  1001 03E9 COPTIC SMALL LETTER HORI
 Ϫ  1002 03EA COPTIC CAPITAL LETTER GANGIA
 ϫ  1003 03EB COPTIC SMALL LETTER GANGIA
 Ϭ  1004 03EC COPTIC CAPITAL LETTER SHIMA
 ϭ  1005 03ED COPTIC SMALL LETTER SHIMA
 Ϯ  1006 03EE COPTIC CAPITAL LETTER DEI
 ϯ  1007 03EF COPTIC SMALL LETTER DEI

% unichars '\p{IsGreek}' '\P{InGreek}' | wc -l
 250
% unichars '\P{IsGreek}' '\p{InGreek}' | wc -l
  18

%  unichars '\p{In=1.1}' | wc -l
6362
%  unichars '\p{In=6.0}' | wc -l
15087

单道具 以下是uniprops:


甚至所有这些:

% uniprops -vag 777
U+0777 ‹ݷ› \N{ ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT FOUR BELOW }:
    \w \pL \p{L_} \p{Lo}
    \p{All} \p{Any} \p{Alnum} \p{Alpha} \p{Alphabetic} \p{Arab} \p{Arabic} \p{Assigned} \p{Is_Arabic} \p{InArabicSupplement} \p{L} \p{Lo} \p{Gr_Base} \p{Grapheme_Base} \p{Graph}
       \p{GrBase} \p{ID_Continue} \p{IDC} \p{ID_Start} \p{IDS} \p{Letter} \p{L_} \p{Other_Letter} \p{Print} \p{Word} \p{XID_Continue} \p{XIDC} \p{XID_Start} \p{XIDS} \p{XPosixAlnum}
       \p{XPosixAlpha} \p{XPosixGraph} \p{XPosixPrint} \p{XPosixWord}
    \p{Age:5.1} \p{Script=Arabic} \p{Bidi_Class:AL} \p{Bidi_Class=Arabic_Letter} \p{Bidi_Class:Arabic_Letter} \p{Bc=AL} \p{Block:Arabic_Supplement} \p{Canonical_Combining_Class:0}
       \p{Canonical_Combining_Class=Not_Reordered} \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR} \p{Canonical_Combining_Class:NR} \p{Decomposition_Type:None} \p{Dt=None}
       \p{East_Asian_Width=Neutral} \p{East_Asian_Width:Neutral} \p{General_Category:L} \p{General_Category=Letter} \p{General_Category:Letter} \p{Gc=L} \p{General_Category:Lo}
       \p{General_Category=Other_Letter} \p{General_Category:Other_Letter} \p{Gc=Lo} \p{Grapheme_Cluster_Break:Other} \p{GCB=XX} \p{Grapheme_Cluster_Break:XX}
       \p{Grapheme_Cluster_Break=Other} \p{Hangul_Syllable_Type:NA} \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA} \p{Joining_Group:Yeh}
       \p{Jg=Yeh} \p{Joining_Type:D} \p{Joining_Type=Dual_Joining} \p{Joining_Type:Dual_Joining} \p{Jt=D} \p{Line_Break:AL} \p{Line_Break=Alphabetic} \p{Line_Break:Alphabetic}
       \p{Lb=AL} \p{Numeric_Type:None} \p{Nt=None} \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:5.1} \p{In=5.1} \p{Present_In:5.2} \p{In=5.2} \p{Present_In:6.0} \p{In=6.0}
       \p{Script:Arab} \p{Script:Arabic} \p{Sc=Arab} \p{Sentence_Break:LE} \p{Sentence_Break=OLetter} \p{Sentence_Break:OLetter} \p{SB=LE} \p{Word_Break:ALetter} \p{WB=LE}
       \p{Word_Break:LE} \p{Word_Break=ALetter}
My和应该在运行Perl 5.10或更高版本的任何地方运行。还有一个与之配套的脚本。

unichars 这是否回答了您的问题:

% unichars '\p{InCyrillic}' | wc -l
     256    
% unichars '\p{InEthiopic}' | wc -l
     356
% unichars '\p{InLatin1}' | wc -l
 128
% unichars '\p{InCombiningDiacriticalMarks}' | wc -l
要包括16个星体层,请添加
-a
: 112 %unichars-a'\p{InAncientGreekNumbers}'| wc-l 七十五

如果您想要未分配或韩文或韩文,则需要
-u

% unichars -u '\p{InEthiopic}' | wc -l
     384    
% unichars -u '\p{InCJKUnifiedIdeographsExtensionA}' | wc -l
    6592

您还可以获得其他信息:

 % unichars '\P{IsGreek}' '\p{InGreek}' 
 ʹ   884 0374 GREEK NUMERAL SIGN
 ;   894 037E GREEK QUESTION MARK
 ΅   901 0385 GREEK DIALYTIKA TONOS
 ·   903 0387 GREEK ANO TELEIA
 Ϣ   994 03E2 COPTIC CAPITAL LETTER SHEI
 ϣ   995 03E3 COPTIC SMALL LETTER SHEI
 Ϥ   996 03E4 COPTIC CAPITAL LETTER FEI
 ϥ   997 03E5 COPTIC SMALL LETTER FEI
 Ϧ   998 03E6 COPTIC CAPITAL LETTER KHEI
 ϧ   999 03E7 COPTIC SMALL LETTER KHEI
 Ϩ  1000 03E8 COPTIC CAPITAL LETTER HORI
 ϩ  1001 03E9 COPTIC SMALL LETTER HORI
 Ϫ  1002 03EA COPTIC CAPITAL LETTER GANGIA
 ϫ  1003 03EB COPTIC SMALL LETTER GANGIA
 Ϭ  1004 03EC COPTIC CAPITAL LETTER SHIMA
 ϭ  1005 03ED COPTIC SMALL LETTER SHIMA
 Ϯ  1006 03EE COPTIC CAPITAL LETTER DEI
 ϯ  1007 03EF COPTIC SMALL LETTER DEI

% unichars '\p{IsGreek}' '\P{InGreek}' | wc -l
 250
% unichars '\P{IsGreek}' '\p{InGreek}' | wc -l
  18

%  unichars '\p{In=1.1}' | wc -l
6362
%  unichars '\p{In=6.0}' | wc -l
15087

单道具 以下是uniprops:


甚至所有这些:

% uniprops -vag 777
U+0777 ‹ݷ› \N{ ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT FOUR BELOW }:
    \w \pL \p{L_} \p{Lo}
    \p{All} \p{Any} \p{Alnum} \p{Alpha} \p{Alphabetic} \p{Arab} \p{Arabic} \p{Assigned} \p{Is_Arabic} \p{InArabicSupplement} \p{L} \p{Lo} \p{Gr_Base} \p{Grapheme_Base} \p{Graph}
       \p{GrBase} \p{ID_Continue} \p{IDC} \p{ID_Start} \p{IDS} \p{Letter} \p{L_} \p{Other_Letter} \p{Print} \p{Word} \p{XID_Continue} \p{XIDC} \p{XID_Start} \p{XIDS} \p{XPosixAlnum}
       \p{XPosixAlpha} \p{XPosixGraph} \p{XPosixPrint} \p{XPosixWord}
    \p{Age:5.1} \p{Script=Arabic} \p{Bidi_Class:AL} \p{Bidi_Class=Arabic_Letter} \p{Bidi_Class:Arabic_Letter} \p{Bc=AL} \p{Block:Arabic_Supplement} \p{Canonical_Combining_Class:0}
       \p{Canonical_Combining_Class=Not_Reordered} \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR} \p{Canonical_Combining_Class:NR} \p{Decomposition_Type:None} \p{Dt=None}
       \p{East_Asian_Width=Neutral} \p{East_Asian_Width:Neutral} \p{General_Category:L} \p{General_Category=Letter} \p{General_Category:Letter} \p{Gc=L} \p{General_Category:Lo}
       \p{General_Category=Other_Letter} \p{General_Category:Other_Letter} \p{Gc=Lo} \p{Grapheme_Cluster_Break:Other} \p{GCB=XX} \p{Grapheme_Cluster_Break:XX}
       \p{Grapheme_Cluster_Break=Other} \p{Hangul_Syllable_Type:NA} \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA} \p{Joining_Group:Yeh}
       \p{Jg=Yeh} \p{Joining_Type:D} \p{Joining_Type=Dual_Joining} \p{Joining_Type:Dual_Joining} \p{Jt=D} \p{Line_Break:AL} \p{Line_Break=Alphabetic} \p{Line_Break:Alphabetic}
       \p{Lb=AL} \p{Numeric_Type:None} \p{Nt=None} \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:5.1} \p{In=5.1} \p{Present_In:5.2} \p{In=5.2} \p{Present_In:6.0} \p{In=6.0}
       \p{Script:Arab} \p{Script:Arabic} \p{Sc=Arab} \p{Sentence_Break:LE} \p{Sentence_Break=OLetter} \p{Sentence_Break:OLetter} \p{SB=LE} \p{Word_Break:ALetter} \p{WB=LE}
       \p{Word_Break:LE} \p{Word_Break=ALetter}
My和应该在运行Perl 5.10或更高版本的任何地方运行。还有一个脚本与之配套