使用Tcl正则表达式从段落中提取前面有两个不同字符串的2个数字

使用Tcl正则表达式从段落中提取前面有两个不同字符串的2个数字,tcl,Tcl,我需要提取前面有两个不同字符串的两个不同数字。 员工Id-->员工16(我需要16)和 员工链接-->员工链接:2(我需要2个)。 源字符串如下所示: Employee16, Employee name is QueenRose Working for 46w0d Billing is Distributed 65537 assigned tasks, 0 reordered, 0 unassigned 0 discarded, 0 lost received, 5/255 lo

我需要提取前面有两个不同字符串的两个不同数字。
员工Id-->员工16
(我需要16)和
员工链接-->员工链接:2
(我需要2个)。 源字符串如下所示:

Employee16, Employee name is QueenRose
  Working for 46w0d
  Billing is Distributed
  65537 assigned tasks, 0 reordered, 0 unassigned
  0 discarded, 0 lost received, 5/255 load
  received sequence unavailable, 0xC2E7 sent sequence
  Employee links: 2 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 46w0d, no tasks pending
    Dt3/5/10:10, since 21w0d, no tasks rcvd
 Employee is currently working in Hardware section.

Employee19, Employee name is Edward11
  Working  for 48w4d
  Billing is Distributed
  206801498 assigned tasks, 0 reordered, 0 unassigned
  655372 discarded, 0 lost received, 9/255 load
  received sequence unavailable, 0x23CA sent sequence
  Employee links: 7 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 47w2d, tasks pending
    Dt3/5/10:10, since 28w6d, no tasks pending
    Dt3/5/10:11, since 18w4d, no tasks pending
    Dt3/5/10:12, since 18w4d, no tasks pending
    Dt3/5/10:13, since 18w4d, no tasks pending
    Dt3/5/10:14, since 18w4d, no tasks pending
    Dt3/5/10:15, since 7w2d, no tasks pending
   Employee is currently working in Hardware sectione.

Employee6 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)

Employee7 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)
Employee(\d+)[^\n\r]*[^M]*Employee links:\s+(\d+)
尝试了以下方法:

Employee16, Employee name is QueenRose
  Working for 46w0d
  Billing is Distributed
  65537 assigned tasks, 0 reordered, 0 unassigned
  0 discarded, 0 lost received, 5/255 load
  received sequence unavailable, 0xC2E7 sent sequence
  Employee links: 2 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 46w0d, no tasks pending
    Dt3/5/10:10, since 21w0d, no tasks rcvd
 Employee is currently working in Hardware section.

Employee19, Employee name is Edward11
  Working  for 48w4d
  Billing is Distributed
  206801498 assigned tasks, 0 reordered, 0 unassigned
  655372 discarded, 0 lost received, 9/255 load
  received sequence unavailable, 0x23CA sent sequence
  Employee links: 7 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 47w2d, tasks pending
    Dt3/5/10:10, since 28w6d, no tasks pending
    Dt3/5/10:11, since 18w4d, no tasks pending
    Dt3/5/10:12, since 18w4d, no tasks pending
    Dt3/5/10:13, since 18w4d, no tasks pending
    Dt3/5/10:14, since 18w4d, no tasks pending
    Dt3/5/10:15, since 7w2d, no tasks pending
   Employee is currently working in Hardware sectione.

Employee6 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)

Employee7 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)
Employee(\d+)[^\n\r]*[^M]*Employee links:\s+(\d+)
预期输出如下所示:

16  2
19  7
 6  2
 7  2
但是没有列出所有的ID和链接。
有人能帮我得到这个吗?

从两个不同的位置提取最简单的方法是两个单独的匹配步骤。如果你先把整篇文章分成几段,这也是最简单的

员工Id-->员工16
(我需要16)

我会提取一个这样的:

regexp -line {^Employee(\d+),} $paragraph -> employeeNumber
(此任务需要行匹配模式,而不是默认的“整字符串”匹配模式。)

员工链接-->员工链接:2
(我需要2个)

对于本例,我们再次假设我们只查看单个员工的总体记录:

regexp -line {^\s+Employee links:\s*(\d+)(.*)$} $paragraph -> links rest
在本例中,我不仅提取了行的
$links
,还提取了行的
$rest
,因为您可能需要考虑这是否重要。当然,以下内容可能更有用:

regexp -line {^\s+Employee links:\s*(\d+)(?:\s+active,\s+(\d+)\s+inactive)?} \
        $paragraph -> activeLinks inactiveLinks
在这种情况下,如果只有第一个数字,则
$inactiveLinks
将有一个空字符串(这似乎发生在员工处于非活动状态时;在这种情况下,您需要进行一些简单的逻辑整理)

最后,当使用
regexp
时,不要忘记检查结果是否匹配

希望这能有所帮助。

我本来打算提供一个完整的答案,但后来我读了多纳尔更有用的教程,觉得我做不到。我将演示如何将文本拆分为段落:

foreach paragraph [regexp -all -inline {.*?\n{2,}} $text] {
    do something with $paragraph
}
在您的尝试中,我看到了
[^\n\r]*
——您确定文本中有回车符和换行符吗