在Python中根据标点或数字拆分字符串_Python_Regex_Split_Numbers_Punctuation

在Python中根据标点或数字拆分字符串

python regex

在Python中根据标点或数字拆分字符串,python,regex,split,numbers,punctuation,Python,Regex,Split,Numbers,Punctuation,每次遇到标点符号或数字时，我都会尝试拆分字符串，例如： toSplit = 'I2eat!Apples22becauseilike?Them' result = re.sub('[0123456789,.?:;~!@#$%^&*()]', ' \1',toSplit).split() 所需的输出将是： ['I', '2', 'eat', '!', 'Apples', '22', 'becauseilike', '?', 'Them'] 然而，上面的代码（尽管它在应该的位置正确地分割）

每次遇到标点符号或数字时，我都会尝试拆分字符串，例如：

toSplit = 'I2eat!Apples22becauseilike?Them'
result = re.sub('[0123456789,.?:;~!@#$%^&*()]', ' \1',toSplit).split()

所需的输出将是：

['I', '2', 'eat', '!', 'Apples', '22', 'becauseilike', '?', 'Them']

然而，上面的代码（尽管它在应该的位置正确地分割）删除了所有的数字和标点符号

如有任何澄清，将不胜感激。

使用

re.split

与capture group：

toSplit = 'I2eat!Apples22becauseilike?Them'
result = re.split('([0-9,.?:;~!@#$%^&*()])', toSplit)
result

输出：

['I', '2', 'eat', '!', 'Apples', '2', '', '2', 'becauseilike', '?', 'Them']

['I', '2', 'eat', '!', 'Apples', '22', 'becauseilike', '?', 'Them']

如果要拆分重复的数字或标点符号，请添加

：

result = re.split('([0-9,.?:;~!@#$%^&*()]+)', toSplit)
result

输出：

['I', '2', 'eat', '!', 'Apples', '2', '', '2', 'becauseilike', '?', 'Them']

['I', '2', 'eat', '!', 'Apples', '22', 'becauseilike', '?', 'Them']

您可以将字符串标记为数字、字母和其他非空格、字母和数字的字符

re.findall(r'\d+|(?:[^\w\s]|_)+|[^\W\d_]+', toSplit)

这里,

```
\d+
```
-1+位
```
（？：[^\w\s]|)+
```
-1+字符，而不是单词和空格字符或
```
[^\W\d][+
```
-任何1+Unicode字母

看

匹配方法比拆分更灵活，因为它还允许标记复杂结构。比如说，您还希望标记十进制（浮点、双精度…）数字。您只需要使用

\d+（？：\。\d+）

而不是

\d+

：

re.findall(r'\d+(?:\.\d+)?|(?:[^\w\s]|_)+|[^\W\d_]+', toSplit) 
             ^^^^^^^^^^^^^

请参阅。

使用

re.split

在找到字母表范围时进行拆分

>>> import re                                                              
>>> re.split(r'([A-Za-z]+)', toSplit)                                      
['', 'I', '2', 'eat', '!', 'Apples', '22', 'becauseilike', '?', 'Them', '']
>>>                                                                        
>>> ' '.join(re.split(r'([A-Za-z]+)', toSplit)).split()                    
['I', '2', 'eat', '!', 'Apples', '22', 'becauseilike', '?', 'Them']

尝试

re.findall（r'\d+|[^\w\s]|[^\w\d+]，toSplit）

您需要获得

['11'，'！！']

，对吗？是的，没错。我还没有尝试过这种情况，谢谢你指出：）那么你可以使用

re.findall（r'\d+|（？：[^\w\s]| |）+|[^\w\d\u]+'，toSplit）

将解决方案概括为数字、字母和其他非空格、字母和数字的字符。我想知道你还想用

22.45text做什么…谢谢！你是救命恩人！：）