python中perluniprops的等价性是什么?
在python中perluniprops的等价性是什么?,python,regex,perl,unicode,Python,Regex,Perl,Unicode,在perl中,有Unicode 7的perluniprops索引,在这里我可以执行以下操作来填充打开和关闭标点: s/(\p{Open_Punctuation})/ $1 /g; s/(\p{Close_Punctuation})/ $1 /g; 使用perl时填充的开始/结束标点的完整列表是什么?python中的等价物是什么?? 相关问题:;这个问题是由回答者投票分开提出的,它应该是分开的。你是在问如何确定给定开放标点对应的结束标点是什么?Unicode没有定义这一点。事实上,甚至没有1:1
perl
中,有Unicode 7的perluniprops
索引,在这里我可以执行以下操作来填充打开和关闭标点:
s/(\p{Open_Punctuation})/ $1 /g;
s/(\p{Close_Punctuation})/ $1 /g;
使用perl时填充的开始/结束标点的完整列表是什么?python中的等价物是什么??
相关问题:;这个问题是由回答者投票分开提出的,它应该是分开的。你是在问如何确定给定开放标点对应的结束标点是什么?Unicode没有定义这一点。事实上,甚至没有1:1的关系
$ unichars '\p{Open_Punctuation}' | wc -l
75
$ unichars '\p{Close_Punctuation}' | wc -l
73
但是,构建自己的映射应该相对容易
$ unichars '\p{Open_Punctuation}' | cat
( U+0028 LEFT PARENTHESIS
[ U+005B LEFT SQUARE BRACKET
{ U+007B LEFT CURLY BRACKET
༺ U+0F3A TIBETAN MARK GUG RTAGS GYON
༼ U+0F3C TIBETAN MARK ANG KHANG GYON
᚛ U+169B OGHAM FEATHER MARK
‚ U+201A SINGLE LOW-9 QUOTATION MARK
„ U+201E DOUBLE LOW-9 QUOTATION MARK
⁅ U+2045 LEFT SQUARE BRACKET WITH QUILL
⁽ U+207D SUPERSCRIPT LEFT PARENTHESIS
₍ U+208D SUBSCRIPT LEFT PARENTHESIS
⌈ U+2308 LEFT CEILING
⌊ U+230A LEFT FLOOR
〈 U+2329 LEFT-POINTING ANGLE BRACKET
❨ U+2768 MEDIUM LEFT PARENTHESIS ORNAMENT
❪ U+276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
❬ U+276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
❮ U+276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
❰ U+2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
❲ U+2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
❴ U+2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
⟅ U+27C5 LEFT S-SHAPED BAG DELIMITER
⟦ U+27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET
⟨ U+27E8 MATHEMATICAL LEFT ANGLE BRACKET
⟪ U+27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
⟬ U+27EC MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
⟮ U+27EE MATHEMATICAL LEFT FLATTENED PARENTHESIS
⦃ U+2983 LEFT WHITE CURLY BRACKET
⦅ U+2985 LEFT WHITE PARENTHESIS
⦇ U+2987 Z NOTATION LEFT IMAGE BRACKET
⦉ U+2989 Z NOTATION LEFT BINDING BRACKET
⦋ U+298B LEFT SQUARE BRACKET WITH UNDERBAR
⦍ U+298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
⦏ U+298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
⦑ U+2991 LEFT ANGLE BRACKET WITH DOT
⦓ U+2993 LEFT ARC LESS-THAN BRACKET
⦕ U+2995 DOUBLE LEFT ARC GREATER-THAN BRACKET
⦗ U+2997 LEFT BLACK TORTOISE SHELL BRACKET
⧘ U+29D8 LEFT WIGGLY FENCE
⧚ U+29DA LEFT DOUBLE WIGGLY FENCE
⧼ U+29FC LEFT-POINTING CURVED ANGLE BRACKET
⸢ U+2E22 TOP LEFT HALF BRACKET
⸤ U+2E24 BOTTOM LEFT HALF BRACKET
⸦ U+2E26 LEFT SIDEWAYS U BRACKET
⸨ U+2E28 LEFT DOUBLE PARENTHESIS
⹂ U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK
〈 U+3008 LEFT ANGLE BRACKET
《 U+300A LEFT DOUBLE ANGLE BRACKET
「 U+300C LEFT CORNER BRACKET
『 U+300E LEFT WHITE CORNER BRACKET
【 U+3010 LEFT BLACK LENTICULAR BRACKET
〔 U+3014 LEFT TORTOISE SHELL BRACKET
〖 U+3016 LEFT WHITE LENTICULAR BRACKET
〘 U+3018 LEFT WHITE TORTOISE SHELL BRACKET
〚 U+301A LEFT WHITE SQUARE BRACKET
〝 U+301D REVERSED DOUBLE PRIME QUOTATION MARK
﴿ U+FD3F ORNATE RIGHT PARENTHESIS
︗ U+FE17 PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
︵ U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
︷ U+FE37 PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
︹ U+FE39 PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
︻ U+FE3B PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
︽ U+FE3D PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
︿ U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
﹁ U+FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
﹃ U+FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
﹇ U+FE47 PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
﹙ U+FE59 SMALL LEFT PARENTHESIS
﹛ U+FE5B SMALL LEFT CURLY BRACKET
﹝ U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
( U+FF08 FULLWIDTH LEFT PARENTHESIS
[ U+FF3B FULLWIDTH LEFT SQUARE BRACKET
{ U+FF5B FULLWIDTH LEFT CURLY BRACKET
⦅ U+FF5F FULLWIDTH LEFT WHITE PARENTHESIS
「 U+FF62 HALFWIDTH LEFT CORNER BRACKET
使用cpan Unicode::Tussle
在python中安装unichars
后:
>>> import subprocess
>>> cmd = "unichars '\p{Open_Punctuation}' | cut -f2 -d' ' | tr -d '\n'"
>>> open_punct = subprocess.check_output(cmd, shell=True).decode('utf8')
Smartmatch is experimental at /usr/local/bin/unichars line 546.
>>> print (open_punct)
([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「
酷,我不知道那
unichars
工具。我如何在debian上安装它?我的问题更多的是如何将\p{Open\u percentration}
和\p{Close\u percentration}
移植到python,但我猜知道集合中的标点符号是第一步。你想让我删除我的答案吗?不,我想那很好。但是一个很好的正则表达式来替换它们,并用类似in的空格填充就足够了。我不懂python。不用担心,安装完成后我会编辑您的答案,我可以访问完整的字符列表=)
>>> import subprocess
>>> cmd = "unichars '\p{Open_Punctuation}' | cut -f2 -d' ' | tr -d '\n'"
>>> open_punct = subprocess.check_output(cmd, shell=True).decode('utf8')
Smartmatch is experimental at /usr/local/bin/unichars line 546.
>>> print (open_punct)
([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「