python中perluniprops的等价性是什么?

python中perluniprops的等价性是什么?,python,regex,perl,unicode,Python,Regex,Perl,Unicode,在perl中,有Unicode 7的perluniprops索引,在这里我可以执行以下操作来填充打开和关闭标点: s/(\p{Open_Punctuation})/ $1 /g; s/(\p{Close_Punctuation})/ $1 /g; 使用perl时填充的开始/结束标点的完整列表是什么?python中的等价物是什么?? 相关问题:;这个问题是由回答者投票分开提出的,它应该是分开的。你是在问如何确定给定开放标点对应的结束标点是什么?Unicode没有定义这一点。事实上,甚至没有1:1

perl
中,有Unicode 7的
perluniprops
索引,在这里我可以执行以下操作来填充打开和关闭标点:

s/(\p{Open_Punctuation})/ $1 /g;
s/(\p{Close_Punctuation})/ $1 /g;
使用perl时填充的开始/结束标点的完整列表是什么?python中的等价物是什么??


相关问题:;这个问题是由回答者投票分开提出的,它应该是分开的。

你是在问如何确定给定开放标点对应的结束标点是什么?Unicode没有定义这一点。事实上,甚至没有1:1的关系

$ unichars '\p{Open_Punctuation}' | wc -l
75

$ unichars '\p{Close_Punctuation}' | wc -l
73
但是,构建自己的映射应该相对容易

$ unichars '\p{Open_Punctuation}' | cat
 (  U+0028 LEFT PARENTHESIS
 [  U+005B LEFT SQUARE BRACKET
 {  U+007B LEFT CURLY BRACKET
 ༺  U+0F3A TIBETAN MARK GUG RTAGS GYON
 ༼  U+0F3C TIBETAN MARK ANG KHANG GYON
 ᚛  U+169B OGHAM FEATHER MARK
 ‚  U+201A SINGLE LOW-9 QUOTATION MARK
 „  U+201E DOUBLE LOW-9 QUOTATION MARK
 ⁅  U+2045 LEFT SQUARE BRACKET WITH QUILL
 ⁽  U+207D SUPERSCRIPT LEFT PARENTHESIS
 ₍  U+208D SUBSCRIPT LEFT PARENTHESIS
 ⌈  U+2308 LEFT CEILING
 ⌊  U+230A LEFT FLOOR
 〈 U+2329 LEFT-POINTING ANGLE BRACKET
 ❨  U+2768 MEDIUM LEFT PARENTHESIS ORNAMENT
 ❪  U+276A MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
 ❬  U+276C MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
 ❮  U+276E HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
 ❰  U+2770 HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
 ❲  U+2772 LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
 ❴  U+2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
 ⟅  U+27C5 LEFT S-SHAPED BAG DELIMITER
 ⟦  U+27E6 MATHEMATICAL LEFT WHITE SQUARE BRACKET
 ⟨  U+27E8 MATHEMATICAL LEFT ANGLE BRACKET
 ⟪  U+27EA MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
 ⟬  U+27EC MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
 ⟮  U+27EE MATHEMATICAL LEFT FLATTENED PARENTHESIS
 ⦃  U+2983 LEFT WHITE CURLY BRACKET
 ⦅  U+2985 LEFT WHITE PARENTHESIS
 ⦇  U+2987 Z NOTATION LEFT IMAGE BRACKET
 ⦉  U+2989 Z NOTATION LEFT BINDING BRACKET
 ⦋  U+298B LEFT SQUARE BRACKET WITH UNDERBAR
 ⦍  U+298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
 ⦏  U+298F LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
 ⦑  U+2991 LEFT ANGLE BRACKET WITH DOT
 ⦓  U+2993 LEFT ARC LESS-THAN BRACKET
 ⦕  U+2995 DOUBLE LEFT ARC GREATER-THAN BRACKET
 ⦗  U+2997 LEFT BLACK TORTOISE SHELL BRACKET
 ⧘  U+29D8 LEFT WIGGLY FENCE
 ⧚  U+29DA LEFT DOUBLE WIGGLY FENCE
 ⧼  U+29FC LEFT-POINTING CURVED ANGLE BRACKET
 ⸢  U+2E22 TOP LEFT HALF BRACKET
 ⸤  U+2E24 BOTTOM LEFT HALF BRACKET
 ⸦  U+2E26 LEFT SIDEWAYS U BRACKET
 ⸨  U+2E28 LEFT DOUBLE PARENTHESIS
 ⹂  U+2E42 DOUBLE LOW-REVERSED-9 QUOTATION MARK
 〈 U+3008 LEFT ANGLE BRACKET
 《 U+300A LEFT DOUBLE ANGLE BRACKET
 「 U+300C LEFT CORNER BRACKET
 『 U+300E LEFT WHITE CORNER BRACKET
 【 U+3010 LEFT BLACK LENTICULAR BRACKET
 〔 U+3014 LEFT TORTOISE SHELL BRACKET
 〖 U+3016 LEFT WHITE LENTICULAR BRACKET
 〘 U+3018 LEFT WHITE TORTOISE SHELL BRACKET
 〚 U+301A LEFT WHITE SQUARE BRACKET
 〝 U+301D REVERSED DOUBLE PRIME QUOTATION MARK
 ﴿  U+FD3F ORNATE RIGHT PARENTHESIS
 ︗ U+FE17 PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
 ︵ U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
 ︷ U+FE37 PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
 ︹ U+FE39 PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
 ︻ U+FE3B PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
 ︽ U+FE3D PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
 ︿ U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
 ﹁ U+FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
 ﹃ U+FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
 ﹇ U+FE47 PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET
 ﹙ U+FE59 SMALL LEFT PARENTHESIS
 ﹛ U+FE5B SMALL LEFT CURLY BRACKET
 ﹝ U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
 ( U+FF08 FULLWIDTH LEFT PARENTHESIS
 [ U+FF3B FULLWIDTH LEFT SQUARE BRACKET
 { U+FF5B FULLWIDTH LEFT CURLY BRACKET
 ⦅ U+FF5F FULLWIDTH LEFT WHITE PARENTHESIS
 「  U+FF62 HALFWIDTH LEFT CORNER BRACKET

使用
cpan Unicode::Tussle
在python中安装
unichars
后:

>>> import subprocess
>>> cmd = "unichars '\p{Open_Punctuation}' | cut -f2 -d' ' | tr -d '\n'"
>>> open_punct = subprocess.check_output(cmd, shell=True).decode('utf8')
Smartmatch is experimental at /usr/local/bin/unichars line 546.
>>> print (open_punct)
([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「

酷,我不知道那
unichars
工具。我如何在debian上安装它?我的问题更多的是如何将
\p{Open\u percentration}
\p{Close\u percentration}
移植到python,但我猜知道集合中的标点符号是第一步。你想让我删除我的答案吗?不,我想那很好。但是一个很好的正则表达式来替换它们,并用类似in的空格填充就足够了。我不懂python。不用担心,安装完成后我会编辑您的答案,我可以访问完整的字符列表=)
>>> import subprocess
>>> cmd = "unichars '\p{Open_Punctuation}' | cut -f2 -d' ' | tr -d '\n'"
>>> open_punct = subprocess.check_output(cmd, shell=True).decode('utf8')
Smartmatch is experimental at /usr/local/bin/unichars line 546.
>>> print (open_punct)
([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「