Python regex或不起作用-我不知道我的模式出了什么问题_Python_Regex

Python regex或不起作用-我不知道我的模式出了什么问题

python regex

Python regex或不起作用-我不知道我的模式出了什么问题,python,regex,Python,Regex,我有以下字符串： 2020-10-2125Chavez and Sons 2020-05-02Bean Inc NaNRobinson, Mcmahon and Atkins 2020-04-25Hill-Fisher 2020-04-02Nothing and Sons 52457Carpenter and Sons 0Carpenter and Sons Carpenter and Sons NoneEconomy and Sons 2020-04-02 我想把它分开： myRegex =

我有以下字符串：

2020-10-2125Chavez and Sons
2020-05-02Bean Inc
NaNRobinson, Mcmahon and Atkins
2020-04-25Hill-Fisher
2020-04-02Nothing and Sons
52457Carpenter and Sons
0Carpenter and Sons
Carpenter and Sons
NoneEconomy and Sons
2020-04-02

我想把它分开：

myRegex = '^([-\d]{0,}|[NnaAOoEe]{0,})(.*)' or '^([0-9]{4}-[0-9]{2}-[0-9]{2,}|[\d]{0,}|[NnaAOoEe]{0,})([\D]{0,})$'

我想要所有数字，精确匹配（na，nan，none）-大写和小写以及第一组中的“”，如：

[2020-10-2125][Chavez and Sons]
[2020-05-02][Bean Inc]
[NaN][Robinson, Mcmahon and Atkins]
[2020-04-25][Hill-Fisher]
[2020-04-02][Nothing and Sons]
[52457][Carpenter and Sons]
[0][Carpenter and Sons]
[][Carpenter and Sons]
[None][Economy and Sons]
[2020-04-02][]

这是错误的：

[2020-04-02No][thing and Sons]

我想要

[2020-04-02][Nothing and Sons]

我如何编写一个正则表达式来检查像“none”这样的精确匹配——不区分大小写（也应该识别“none”、“none”等）

您可以将想要匹配的表达式与一个简单的

组合起来，但请记住，引擎总是会选择第一个可能的匹配；因此，您希望先放置更具体的模式，然后再返回到更一般的情况

试试这个：

my_re = re.compile(r'^([0-9]{4}-[0-9]{2}-[0-9]{2,}|\d+|N(?:aN|one)|)(\D.*)$', re.IGNORECASE)

re.IGNORECASE

标志表示忽略大小写差异

另外，注意量词

{0，}

写得更好

；但是您希望至少需要一个匹配，或者返回到一个更通用的模式，因此实际上您需要

（也可以编写

{1，}

；但同样，您更喜欢更简洁的标准符号）。在已经封装了字符类的

\D

周围不需要方括号（但是如果您想将两个字符类合并，例如

[-\D]

，则需要方括号）

演示：

最后，请注意，命名局部变量的标准Python表示法更喜欢

snake\u case

而不是

dromedaryCase

。（另请参见。）

关于re.I的以下内容：

(None|NaN?|[-\d]+)?(.*)

说明：

```
（无|南|[-\d]+）？
```
- 要么没有
- 或者NaN，最后N是可选的（由于
```
？
```
  ），因此它也匹配NA
- 或数字和破折号一次或多次
- 整个组
```
（）
```
  是可选的，因为
```
？
```
  意味着它可能不在那里
```
（.*）
```
结尾的任意字符

但是，仍然可能存在边缘情况。考虑以下事项：

National Geographic
---Test

将被解析为

[Na][tional Geographic]
[---][Test]

另一种选择：

从这里我们可以继续使正则表达式变得更复杂，但是，我认为在没有正则表达式的情况下实现自定义解析会简单得多。每行和中的循环字符：

如果它以数字开头，则将所有数字和破折号解析为组1，其余的解析为组2（即当您点击一个字符时，更改组）
取字符串的前4个字符，如果它们为“无”，则将其拆分。同时确保第5个字符是大写的（不区分大小写的
```
行[:4]。lower（）==“none”和第[4]行。isupper（）
```
）

与上述步骤类似，但对于NA和NaN：

line[：3]。lower（）=“nan”和第[3]行。isupper（）

line[：2]。lower（）=“na”和第[2]行。isupper（）

以上内容应该会产生更准确的结果，并且应该更容易阅读

示例代码：

with open("/tmp/data") as f:
    lines = f.readlines()

results = []
for line in lines:
    # Remove spaces and \n
    line = line.strip()
    if line[0].isdigit() or line[0] == "-":
        i = 0
        while line[i].isdigit() or line[i] == "-":
            i += 1
            if i == len(line) - 1:
                i = len(line)
                break
        results.append((line[:i], line[i:]))

    elif line[:4].lower() == "none" and line[4].isupper():
        results.append((line[:4], line[4:]))

    elif line[:3].lower() == "nan" and line[3].isupper():
        results.append((line[:3], line[3:]))

    elif line[:2].lower() == "na" and line[2].isupper():
        results.append((line[:2], line[2:]))
    else:
         # Assume group1 is missing! Everything is group2
         results.append((None, line))

for g1, g2 in results:
    print(f"[{g1 or ''}][{g2}]")

数据：

输出：

$ python ~/tmp/so.py 
[2020-10-2125][Chavez and Sons]
[2020-05-02][Bean Inc]
[NaN][Robinson, Mcmahon and Atkins]
[2020-04-25][Hill-Fisher]
[2020-04-02][Nothing and Sons]
[52457][Carpenter and Sons]
[0][Carpenter and Sons]
[][Carpenter and Sons]
[None][Economy and Sons]
[NoNe][Economy and Sons]
[2020-04-02][]
[NA][Economy and Sons]
[---][Test]
[][National Geographic]

您可以为整个正则表达式设置

re.IGNORECASE

标志，或者为“无”设置匹配，例如

[Nn][Oo][Nn][Ee]

。thx如何将其与其他项组合？在括号中，您可以将可能的匹配项组合在一起。

$ python ~/tmp/so.py 
[2020-10-2125][Chavez and Sons]
[2020-05-02][Bean Inc]
[NaN][Robinson, Mcmahon and Atkins]
[2020-04-25][Hill-Fisher]
[2020-04-02][Nothing and Sons]
[52457][Carpenter and Sons]
[0][Carpenter and Sons]
[][Carpenter and Sons]
[None][Economy and Sons]
[NoNe][Economy and Sons]
[2020-04-02][]
[NA][Economy and Sons]
[---][Test]
[][National Geographic]