Regex 使用R中的正则表达式从电子邮件中提取名称

Regex 使用R中的正则表达式从电子邮件中提取名称,regex,r,Regex,R,我有一个字符串-这是电子邮件链,我需要提取发件人的姓名(From:)。下面是一封电子邮件示例 str1 <- 'From : Wendy YEOW (SLA) To : xxxx@lt.org Subject : RE: OneService@S From: SLA Enquiry (SLA) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S From: Siti Zaharah

我有一个字符串-这是电子邮件链,我需要提取发件人的姓名
(From:)
。下面是一封电子邮件示例

str1 <- 'From : Wendy YEOW (SLA) To : xxxx@lt.org Subject : RE: OneService@S
From: SLA Enquiry (SLA) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S 
From: Siti Zaharah RAMAN (ARKS) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S 
From: SLA Enquiry (SLA) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S 
From: Chin Hwang LAU (TA) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S'
但我期望的结果是:

[1] "Wendy YEOW (SLA)"    "SLA Enquiry (SLA)"    "Siti Zaharah RAMAN (ARKS)"   "SLA Enquiry (SLA)"    "Chin Hwang LAU (TA)"

不太优雅,但您可以尝试:

gsub(" *(From|To|Sent) *:? *","",regmatches(str1,gregexpr("From *:[^:]+",str1))[[1]])
#[1] "Wendy YEOW (SLA)"          "SLA Enquiry (SLA)"        
#[3] "Siti Zaharah RAMAN (ARKS)" "SLA Enquiry (SLA)"        
#[5] "Chin Hwang LAU (TA)"

请将此正则表达式与strsplit()一起使用。:


这是因为我正在使用反向引用(
\\1
)来提取第一组括号中的通配符。

您可以使用
strsplit
。这里不需要
gsub

strsplit(str1, "From ?: | (To|Sent) ?:.*?(\\nFrom ?: |$)")[[1]][-1]
# [1] "Wendy YEOW (SLA)"          "SLA Enquiry (SLA)"         "Siti Zaharah RAMAN (ARKS)"
# [4] "SLA Enquiry (SLA)"         "Chin Hwang LAU (TA)"  
正则表达式基本上由两部分组成:

  • “From:”
    :这是字符串的开头。拆分返回一个空字符串和原始字符串的其余部分
  • “(发送到):*(\\nFrom?:|$)”
    :此正则表达式表示名称后的文本。它包括以
    “到”
    “发送”
    开头的子字符串,以换行符(
    “\\n”
    )结尾,然后是“的下一个
    ”或字符串的结尾(
    “$”

  • 最后,必须使用
    [-1]
    来删除空字符串(在“
    ”中第一个
    ”之前)。

    这很简单,我想了解为什么这样做会奏效。你能添加一些评论吗?@Andrie我添加了一个解释。谢谢。我将去重新阅读strsplit如何处理regex。(+1)如果名称类似于发件人:Tony Kwa/Ke DY
    ,则上述代码将出现在
    ny Kwa/Ke DY
    中。它从实际名称中删除“收件人”一词-
    gsub("From *: (.*?) (To|Sent).*", "\\1", strsplit(str1, "\n")[[1]])
    
    [1] "Wendy YEOW (SLA)"         
    [2] "SLA Enquiry (SLA)"        
    [3] "Siti Zaharah RAMAN (ARKS)"
    [4] "SLA Enquiry (SLA)"        
    [5] "Chin Hwang LAU (TA)" 
    
    strsplit(str1, "From ?: | (To|Sent) ?:.*?(\\nFrom ?: |$)")[[1]][-1]
    # [1] "Wendy YEOW (SLA)"          "SLA Enquiry (SLA)"         "Siti Zaharah RAMAN (ARKS)"
    # [4] "SLA Enquiry (SLA)"         "Chin Hwang LAU (TA)"