如何在python中使用长的未格式化字符串来根据找到的索引提取信息？_Python_Regex

如何在python中使用长的未格式化字符串来根据找到的索引提取信息？

python regex

如何在python中使用长的未格式化字符串来根据找到的索引提取信息？,python,regex,Python,Regex,我有以下文本（使用Exchangelib从邮件正文中提取）使用以下基本代码，我可以打印找到模式的索引 body = receivedbody result = body.find("ACCOUNT") print(result) 如何搜索帐户并打印符合找到的模式的编号67805670-11 基本上，我想做的是与使用Exchangelib接收的主体交互，也许Exchangelib中有一些库或内部方法可以帮助我实现这一点。为此，您需要使用正则表达式或 text=&qu

我有以下文本（使用Exchangelib从邮件正文中提取）

使用以下基本代码，我可以打印找到模式的索引

body = receivedbody
result = body.find("ACCOUNT")
print(result)

如何搜索

帐户

并打印符合找到的模式的编号

67805670-11

基本上，我想做的是与使用Exchangelib接收的主体交互，也许Exchangelib中有一些库或内部方法可以帮助我实现这一点。

为此，您需要使用正则表达式或

    text="""The following task were executed for department A . PLEASE 
        STORED AS FOLLOWS
       
         “Task done with APO”
        
         APO Sent / A department  Stored
         VIA LOCAL MARKET
         ACCOUNT 67805670-11"""
    
    import re
    pattern = r"[\d]*-[\d]*"
    re.findall(pattern=pattern, string=text)

    ['67805670-11']

那么这里发生了什么：

re.findall（pattern，string）

使用该模式识别文本中包含该模式的部分。模式

r“[\d]*-[\d]*”

查找多个数字后跟破折号和更多数字的字符串。我没有具体说明前一位和后一位的长度是多长，但一个人肯定可以

您可以指定一个模式，该模式将包含由空格分隔的字母与帐号：

pattern = r"[\w]*\s*[\d]*-[\d]*"
re.findall(pattern=pattern, string=text)

['ACCOUNT 67805670-11']

您可以轻松地将输出分配到命名空间：

output  = re.findall(pattern=pattern, string=text)
if output != []:
    print(f"The account number is {output[0].split(' ')[1]}")
else:
    print("no account number found")


The account number is 67805670-11

为此，您需要使用正则表达式或re

    text="""The following task were executed for department A . PLEASE 
        STORED AS FOLLOWS
       
         “Task done with APO”
        
         APO Sent / A department  Stored
         VIA LOCAL MARKET
         ACCOUNT 67805670-11"""
    
    import re
    pattern = r"[\d]*-[\d]*"
    re.findall(pattern=pattern, string=text)

    ['67805670-11']

那么这里发生了什么：

re.findall（pattern，string）

使用该模式识别文本中包含该模式的部分。模式

r“[\d]*-[\d]*”

查找多个数字后跟破折号和更多数字的字符串。我没有具体说明前一位和后一位的长度是多长，但一个人肯定可以

您可以指定一个模式，该模式将包含由空格分隔的字母与帐号：

pattern = r"[\w]*\s*[\d]*-[\d]*"
re.findall(pattern=pattern, string=text)

['ACCOUNT 67805670-11']

您可以轻松地将输出分配到命名空间：

output  = re.findall(pattern=pattern, string=text)
if output != []:
    print(f"The account number is {output[0].split(' ')[1]}")
else:
    print("no account number found")


The account number is 67805670-11

如果帐号位于字符串的末尾，则此方法也有效。注意：如果帐号不在字符串末尾，则需要进行一些额外的工作

s = \
"""
The following task were executed for department A . PLEASE STORED AS FOLLOWS 

“Task done with APO”

APO Sent / A department  Stored 
VIA LOCAL MARKET
ACCOUNT 67805670-11
"""

print(s[s.index("ACCOUNT") + len("ACCOUNT") + 1:].rstrip())

index（）函数返回“ACCOUNT”开头的索引，然后我们通过添加“ACCOUNT”的长度和空格来获得账号，然后从字符串结尾的右侧去掉任何可能的换行符。

如果账号在字符串的末尾，此方法也有效。注意：如果帐号不在字符串末尾，则需要进行一些额外的工作

s = \
"""
The following task were executed for department A . PLEASE STORED AS FOLLOWS 

“Task done with APO”

APO Sent / A department  Stored 
VIA LOCAL MARKET
ACCOUNT 67805670-11
"""

print(s[s.index("ACCOUNT") + len("ACCOUNT") + 1:].rstrip())

index（）函数返回“ACCOUNT”开头的索引，然后我们通过添加“ACCOUNT”的长度和空格来获得帐号，然后从字符串结尾的右侧去掉任何可能的换行符。

谢谢，工作正常，但不幸的是帐号可以在正文中的任何位置。我将根据@hussam答案选择regex。谢谢，效果很好，但不幸的是，帐号可以在身体的任何地方。我将根据@hussam answer选择regex。只需将

text.findall

更改为

re.findall

名称空间示例不起作用，因为对象

output

是一个列表，没有

split

attribute@Sallyerik，谢谢您的故障排除。完成。都修好了。需要引用

output[0]

只需更改

text.findall

的

re.findall

命名空间示例不起作用，因为对象

output

是一个列表，没有

split

attribute@Sallyerik，谢谢您的故障排除。完成。都修好了。需要使用regex

ACCOUNT\s*（\d+-\d+）

仅使用regex

ACCOUNT\s*（\d+-\d+）

引用

输出[0]

s = \
"""
The following task were executed for department A . PLEASE STORED AS FOLLOWS 

“Task done with APO”

APO Sent / A department  Stored 
VIA LOCAL MARKET
ACCOUNT 67805670-11
"""

print(s[s.index("ACCOUNT") + len("ACCOUNT") + 1:].rstrip())