Python 获取科学引文的正则表达式_Python_Regex_Re

Python 获取科学引文的正则表达式

python regex

Python 获取科学引文的正则表达式,python,regex,re,Python,Regex,Re,我试图捕捉至少有一个数字的文本括号（想想引文）。这是我现在的正则表达式，它工作正常：所以我想让它捕获（Author 2000）和（2000），但不是（Author）我试图使用python捕获所有这些括号，但在python中，它也捕获括号中的文本，即使它们没有数字 import re with open('text.txt') as f: f = f.read() s = "\((?=.*\d).*?\)" citations = re.findall(s, f) citati

我试图捕捉至少有一个数字的文本括号（想想引文）。这是我现在的正则表达式，它工作正常：

所以我想让它捕获

（Author 2000）

和

（2000）

，但不是

（Author）

我试图使用python捕获所有这些括号，但在python中，它也捕获括号中的文本，即使它们没有数字

import re

with open('text.txt') as f:
    f = f.read()

s = "\((?=.*\d).*?\)"

citations = re.findall(s, f)

citations = list(set(citations))

for c in citations:
    print (c)

知道我做错了什么吗？

处理此表达式最可靠的方法可能是在表达式可能增长时添加边界。例如，我们可以尝试创建字符列表，希望在其中收集所需数据：

(?=\().([a-z]+)([\s,;]+?)([0-9]+)(?=\)).

试验演示

const regex=/（？=\（）（[a-z]+）（[\s，；]+）（[0-9]+）（？=\）./mgi；
const str=`some text we wish before（Author）some text we wish before（Author 2000）some text we wish before（Author 2000）some text we wish before（Author 2000）some text we wish before（Author 2000）some text we wish before（Author 2000）some text we wish before（Author）some we wish before（Author；
让m；
while（（m=regex.exec（str））！==null）{
//这是避免具有零宽度匹配的无限循环所必需的
if（m.index==regex.lastIndex）{
regex.lastIndex++；
}
//可以通过'm`-变量访问结果。
m、 forEach（（匹配，组索引）=>{
log（`Found match，group${groupIndex}:${match}`）；
});
}

您可以使用

re.findall(r'\([^()\d]*\d[^()]*\)', s)

见

详细信息

```
\（
```
-a
```
（
```
字符
```
[^（）\d]*
```
-0个或更多字符，而不是
```
（
```
，
```
）
```
和数字
```
\d
```
-一个数字
```
[^（）]*
```
-0个或更多字符，而不是
```
（
```
，
```
）
```
```
\）
```
-a
```
）
```
字符

见：

要获得不带括号的结果，请添加捕获组：

rx = re.compile(r"\(([^()\d]*\d[^()]*)\)")
                    ^                ^

请参阅。

您想在2000年拍摄吗？我想，如果不清楚的话，很抱歉。（Author 2000）是的，（2000）是的，（Author）不是。你只能用（\d+）来捕获2000，对吧，但那不会捕获（Author 2000）我想你需要

r'\（（？=[^（）]*\d）[^（）]*\）

谢谢，我来看看。如果没有数字，这会产生奇怪的结果，比如：

re.findall(r'\([^()\d]*\d[^()]*\)', s)

import re
rx = re.compile(r"\([^()\d]*\d[^()]*\)")
s = "Some (Author) and (Author 2000)"
print(rx.findall(s)) # => ['(Author 2000)']

rx = re.compile(r"\(([^()\d]*\d[^()]*)\)")
                    ^                ^