Regex 使用负正则表达式模式拆分字符串_Regex_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

Regex 使用负正则表达式模式拆分字符串

regex

Regex 使用负正则表达式模式拆分字符串,regex,elasticsearch,Regex,elasticsearch,我想用非字母数字字符分割sting，除了特定的模式例如： string_1 = "section (ab) 5(a)" string_2 = "section -bd, 6(1b)(2)" string_3 = "section - ac - 12(c)" string_4 = "Section (ab) 5(1a)(cf) (ad)" string_5 = "section (ab) 5(a) test

我想用非字母数字字符分割sting，除了特定的模式

例如：

string_1 = "section (ab) 5(a)"
string_2 = "section -bd, 6(1b)(2)"
string_3 = "section - ac - 12(c)"
string_4 = "Section (ab) 5(1a)(cf) (ad)"
string_5 = "section (ab) 5(a) test (ab) 5 6(ad)"

我想以某种方式拆分这些字符串，以便获得下面的输出

["section", "ab", "5(a)"]
["section", "bd", "6(1b)(2)"]
["section", "ac", "12(c)"]
["section", "ab", "5(1a)(cf)", "ad"]
["section", "ab", "5(a)", "test", "ab, "5", "6(ad)"]

更准确地说，我想将每个非字母数字字符分割为除此之外的

\d+（[\w\（\）]+）

模式。

您可以使用

\d+[\w（）]+\w+

看

详细信息

```
\d+[\w（）]+
```
-1+位，然后是1+字或
```
（
```
或
```
）
```
字符
```
|
```
-或
```
\w+
```
-1+字字符

在ElasticSearch中，使用

“标记器”：{
“my_标记器”：{
“类型”：“模式”，
“模式”：“\\d+[\\w（）]+\\\w+”，
“组”：0
}
}

可以在

findall

内部的这个正则表达式中使用以下方法实现：

\b\w+(?:\([^)]*\))*

代码：

>>> import re
>>> reg = re.compile(r'\b\w+(?:\([^)]*\))*')
>>> arr = ['section (ab) 5(a)', 'section -bd, 6(1b)(2)', 'section - ac - 12(c)', 'Section (ab) 5(1a)(cf) (ad)', 'section (ab) 5(a) test (ab) 5 6(ad)']
>>> for el in arr:
...     print ( reg.findall(el) )
...
['section', 'ab', '5(a)']
['section', 'bd', '6(1b)(2)']
['section', 'ac', '12(c)']
['Section', 'ab', '5(1a)(cf)', 'ad']
['section', 'ab', '5(a)', 'test', 'ab', '5', '6(ad)']

你不确定正则表达式应该是什么样子吗？你试过什么？@Chiperific，我不确定正则表达式，所以我试过的是

\W（？）（\d+（[\W\（\）]+）

，但它没有从字符串1中删除

（）

。

string_1

的输出应该是

[“section”，“ab”，“5（a）”]

而不是

[“section”，“ab”，“5（a）””

此项仅适用于示例字符串，但它不适用于

section（ab）5（a）test（ab）5

或

section（ab）5（1a）（cf）（ad）

。对不起，我想我应该添加更多的示例字符串。我已经用更多的信息更新了我的问题examples@Aninda：这是可行的还是只需要一个拆分正则表达式？实际上我需要一个用于Elasticsearch（模式标记器）的拆分正则表达式，所以

findall

不是我的选择。你的

Elasticsearch

而不是

python

或

python3

正则表达式对我帮助很大，谢谢你哦，那么你应该用

Elasticsearch

来标记它，而不是

python3

或者

python3

来获得更好的答案我正在寻找一个正则表达式，不想使用

findall

@Aninda有一个

r'\d+[\w（）]+|\w+'

regex在这里。如果不是

re.findall

，您计划使用什么方法（为什么只限于使用某些特定方法）？请注意，此处无法使用

re.split

，因为您不能使用

re

regex跳过模式序列。但是，使用PyPi

regex

模块也可以。@如果使用PyPi regex拆分方法，有时会在输出中收到空项，请参阅。我宁愿坚持使用

re.findall

。我需要这个正则表达式模式，因为我会将它们与Elasticsearch（模式标记器）一起使用，所以我不需要

re.findall

。虽然你的正则表达式对我很有帮助，谢谢