Regex 将Excel中的两个列表与VBA正则表达式进行比较_Regex_Vba_Excel_Excel Match

Regex 将Excel中的两个列表与VBA正则表达式进行比较

regex vba excel

Regex 将Excel中的两个列表与VBA正则表达式进行比较,regex,vba,excel,excel-match,Regex,Vba,Excel,Excel Match,我想用它们来比较Excel中的两个列表（列）以找到匹配项。由于这是一个相当复杂的操作，我过去曾在Excel中使用过几种不同的函数（非VBA），但事实证明，这充其量是一种笨拙的操作，因此，如果可能的话，我想尝试一种一体式VBA解决方案第一列有不规则的名称（如引用的昵称、后缀，如“jr”或“sr”，以及围绕“首选”版本的括号）。此外，当出现中间名时，它们可以是名称或首字母第一列中的顺序为： <first name or initial> <space> <a

我想用它们来比较Excel中的两个列表（列）以找到匹配项。由于这是一个相当复杂的操作，我过去曾在Excel中使用过几种不同的函数（非VBA），但事实证明，这充其量是一种笨拙的操作，因此，如果可能的话，我想尝试一种一体式VBA解决方案

第一列有不规则的名称（如引用的昵称、后缀，如“jr”或“sr”，以及围绕“首选”版本的括号）。此外，当出现中间名时，它们可以是名称或首字母

第一列中的顺序为：

 <first name or initial>
 <space>
 <any parenthetical 'preferred' names - if they exist>
 <space>
 <middle name or initial - if it exists>
 <space>
 <quoted nickname or initial - if it exists>
 <space>
 <last name>
 <comma - if necessary><space - if necessary><suffix - if it exists>

尽管我在这里保留了“违规行为”，但我可能会在比较代码中使用某种“标志”来逐个提醒我

我一直在尝试几种模式，这是我最近的一次：

["]?([A-Za-z]?)[.]?["]?[.]?[\s]?[,]?[\s]?

但是，我想考虑姓氏和后缀（如果存在）。我用“global”对它进行了测试，但是我不知道如何通过反向引用来区分姓氏和后缀

然后，我想比较两个列表中最后、第一、中间的首字母（因为大多数名字都只是第一个列表中的首字母）

 An example would be:
 (1st list)
 John (Johnny) B. "Abe" Smith, Jr.
 turned into:
 Smith Jr,John (Johnny) B "Abe"
 or
 Smith Jr,John B

 and
 (2nd list)
 Smith Jr,John Bertrand
 turned into:
 Smith Jr,John B

 Then run a comparison between the two columns.

对于这个列表比较，什么是一个良好的起点或持续点

2012年4月10日附录：

作为旁注，我需要删除昵称中的引号和首选名称中的括号。我是否可以将分组的引用进一步细分为子组（在下面的示例中）

我可以这样把他们分组吗：

 (?:(([ ])(\()([^)]*)(\))))? # (2) parenthetical 'preferred' name (optional) 
 not sure how to do this one -  # (5,6) quoted nickname or initial (optional)

我在“regexcoach”和“RegExr”中尝试了它们，它们工作得很好，但在VBA中，当我希望返回如中所示的反向引用时 \11,\5 返回的只是名字、数字1和逗号（例如“Carl1”）。我要回去检查是否有打字错误。谢谢你的帮助

2012年4月17日附录：

有一个名字“情境”我忽略了，那就是由两个或两个以上单词组成的姓氏，例如“圣赛尔”或“冯·威廉”。
是否可以添加以下内容

 `((St|Von)[ ])?

在你提供的这个正则表达式中工作

 `((St|Von)[ ])?([^\,()"']+)

我在regexcoach和RegExr中的测试没有很好地工作，因为替换返回的“St”前面有一个空格

这是一个可能有用的正则表达式，它将按以下顺序为您提供6个捕获组：名字、首选名、中间名、昵称、姓氏、后缀

([a-z]+)\.?\s(?:(\([a-z]+\))\s)?(?:([a-z]+)\.?\s)?(?:("[a-z]+")\s)?([a-z]+)(?:,\s([a-z]+))?

下面是一个解释：

([a-z]+)\.?\s          # First name, followed by optional '.' (required)
(?:(\([a-z]+\))\s)?    # Preferred name, optional
(?:([a-z]+)\.?\s)?     # Middle name, optional
(?:("[a-z]+")\s)?      # Nickname, optional
([a-z]+)               # Last name, required
(?:,\s([a-z]+))?       # Suffix, optional

例如，您可以将

John（johny）B“Abe”Smith，Jr.

转换为

Smith Jr，John（johny）B“Abe”

，方法是组合以下组

\5\6、\1\2\3\4

，或者使用

\5\6、\1\3

重做-

这是不同的方法。它可能在VBA中工作，这只是一个示例。我用Perl测试了它，效果很好。但是，我不会显示perl代码，
只有正则表达式和一些解释

这是一个分两步的过程

规范化列文本

进行主语法分析

规范化过程

获取列值
去除所有点
-全局搜索
```
\.
```
，替换为无
```
'
```
将空白变为空格-全局搜索
```
\s+
```
，替换为单个空格
```
[]
```

（请注意，如果它不能正常化，无论尝试了什么，我都看不到成功的机会）

主解析过程在规范化列值（对两列都执行）之后，通过这些正则表达式运行它

第1列正则表达式

^
  [ ]?
  ([^\ ,()"']+)                        # (1)     first name or initial          (required)
  (?:  ([ ] \( [^)]* \))    )?         # (2)     parenthetical 'preferred' name (optional)
  (?:
       ([ ] [^\ ,()"'] )               # (3,4)   middle initial OR name         (optional)
       ([^\ ,()"']*)                   #         name and initial are both captured
  )?
  (?:  ([ ] (["'] ) .*?) \6 )?         # (5,6)   quoted nickname or initial     (optional)
  [ ]  ([^\ ,()"']+)                   # (7)     last name                      (required)
  (?:
        [, ]* ([ ].+?) [ ]?            # (8)     suffix                         (optional)
      | .*?
  )?
$

^
  [ ]?
  ([^\ ,()"']+)                  # (1)     last name                      (required)
  (?: ([ ] [^\ ,()"']+) )?       # (2)     suffix                         (optional)
  ,
  ([^\ ,()"']+)                  # (3)     first name or initial          (required)
  (?:
      ([ ] [^\ ,()"'])           # (4,5)   middle initial OR name         (optional)
      ([^\ ,()"']*)
  )?
  .*
$

更换取决于您需要什么。
定义了三种类型（根据需要将

替换为

）：

1a型全中-

7$8、$1$2$3$4$5$6

1b类中间首字母-

7$8、$1$2$3$5$6

键入2中间首字母-

$7$8，$1$3

转换示例：

Input (raw)               = 'John (Johnny) Bertrand "Abe" Smith, Jr.  '
Out type 1 full middle    = 'Smith Jr,John (Johnny) Bertrand "Abe"'
Out type 1 middle initial = 'Smith Jr,John (Johnny) B "Abe"'
Out type 2 middle initial = 'Smith Jr,John B'

Input                     = 'Smith Jr.,John Bertrand  '
Out type 1 full middle    = 'Smith Jr,John Bertrand'
Out type 1 middle initial = 'Smith Jr,John B'

第2列正则表达式

^
  [ ]?
  ([^\ ,()"']+)                        # (1)     first name or initial          (required)
  (?:  ([ ] \( [^)]* \))    )?         # (2)     parenthetical 'preferred' name (optional)
  (?:
       ([ ] [^\ ,()"'] )               # (3,4)   middle initial OR name         (optional)
       ([^\ ,()"']*)                   #         name and initial are both captured
  )?
  (?:  ([ ] (["'] ) .*?) \6 )?         # (5,6)   quoted nickname or initial     (optional)
  [ ]  ([^\ ,()"']+)                   # (7)     last name                      (required)
  (?:
        [, ]* ([ ].+?) [ ]?            # (8)     suffix                         (optional)
      | .*?
  )?
$

^
  [ ]?
  ([^\ ,()"']+)                  # (1)     last name                      (required)
  (?: ([ ] [^\ ,()"']+) )?       # (2)     suffix                         (optional)
  ,
  ([^\ ,()"']+)                  # (3)     first name or initial          (required)
  (?:
      ([ ] [^\ ,()"'])           # (4,5)   middle initial OR name         (optional)
      ([^\ ,()"']*)
  )?
  .*
$

更换取决于您需要什么。
定义了两种类型（根据需要将

替换为

）：

1a型全中-

1$2，$3$4$5

键入1b中间首字母-

1$2，$3$4

转换示例：

Input (raw)               = 'John (Johnny) Bertrand "Abe" Smith, Jr.  '
Out type 1 full middle    = 'Smith Jr,John (Johnny) Bertrand "Abe"'
Out type 1 middle initial = 'Smith Jr,John (Johnny) B "Abe"'
Out type 2 middle initial = 'Smith Jr,John B'

Input                     = 'Smith Jr.,John Bertrand  '
Out type 1 full middle    = 'Smith Jr,John Bertrand'
Out type 1 middle initial = 'Smith Jr,John B'

VBA更换帮助

^
  [ ]?
  ([^\ ,()"']+)                        # (1)     first name or initial          (required)
  (?:  ([ ] \( [^)]* \))    )?         # (2)     parenthetical 'preferred' name (optional)
  (?:
       ([ ] [^\ ,()"'] )               # (3,4)   middle initial OR name         (optional)
       ([^\ ,()"']*)                   #         name and initial are both captured
  )?
  (?:  ([ ] (["'] ) .*?) \6 )?         # (5,6)   quoted nickname or initial     (optional)
  [ ]  ([^\ ,()"']+)                   # (7)     last name                      (required)
  (?:
        [, ]* ([ ].+?) [ ]?            # (8)     suffix                         (optional)
      | .*?
  )?
$

^
  [ ]?
  ([^\ ,()"']+)                  # (1)     last name                      (required)
  (?: ([ ] [^\ ,()"']+) )?       # (2)     suffix                         (optional)
  ,
  ([^\ ,()"']+)                  # (3)     first name or initial          (required)
  (?:
      ([ ] [^\ ,()"'])           # (4,5)   middle initial OR name         (optional)
      ([^\ ,()"']*)
  )?
  .*
$

这适用于非常旧的Excel副本，创建VBA项目。
这两个模块是为显示示例而创建的。
他们都做同样的事情

第一个是所有可能的替换类型的详细示例。
第二种是精简版，只使用2型比较

正如您所知，我以前没有做过VB，但它应该足够简单
让您了解替换是如何工作的，以及如何与excel结合
列

如果您只是做一个简单的比较，您可能需要做一个col 1 val
一次，然后对照第2列中的每个值进行检查，然后转到
第1列，然后重复

要以最快的方式执行此操作，请创建两个额外的列，然后转换
将列VAL转换为type-2（变量strC1_2和strC2_2，参见示例），然后复制它们
添加到新列。
之后，您不需要正则表达式，只需比较列，找到匹配的行，
然后删除type-2列

冗长的

Sub RegexColumnValueComparison()

' Column 1 and 2 , Sample values
' These should probably be passed in values
' ============================================
strC1 = "John (Johnny)   Bertrand ""Abe""   Smith, Jr.  "
strC2 = "Smith Jr.,John Bertrand  "

' Normalization Regexs for whitespace's and period's
' (use for both column values)
' =============================================
Set rxDot = CreateObject("vbscript.regexp")
rxDot.Global = True
rxDot.Pattern = "\."
Set rxWSp = CreateObject("vbscript.regexp")
rxWSp.Global = True
rxWSp.Pattern = "\s+"

' Column 1 Regex
' ==================
Set rxC1 = CreateObject("vbscript.regexp")
rxC1.Global = False
rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:([ ]\([^)]*\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ]([""']).*?)\6)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$"

' Column 2 Regex
' ==================
Set rxC2 = CreateObject("vbscript.regexp")
rxC2.Global = False
rxC2.Pattern = "^[ ]?([^ ,()""']+)(?:([ ][^ ,()""']+))?,([^ ,()""']+)(?:([ ][^ ,()""'])([^ ,()""']*))?.*$"

' Normalize column 1 and 2, Copy to new var
' ============================================
strC1_Normal = rxDot.Replace(rxWSp.Replace(strC1, " "), "")
strC2_Normal = rxDot.Replace(rxWSp.Replace(strC2, " "), "")


' ------------------------------------------------------
' This section is informational
' Shows some sample replacements before comparison
' Just pick 1 replacement from each column, discard the rest
' ------------------------------------------------------

' Create Some Replacement Types for Column 1
' =====================================================
strC1_1a = rxC1.Replace(strC1_Normal, "$7$8,$1$2$3$4$5$6")
strC1_1b = rxC1.Replace(strC1_Normal, "$7$8,$1$2$3$5$6")
strC1_2 = rxC1.Replace(strC1_Normal, "$7$8,$1$3")

' Create Some Replacement Types for Column 2
' =====================================================
strC2_1b = rxC2.Replace(strC2_Normal, "$1$2,$3$4$5")
strC2_2 = rxC2.Replace(strC2_Normal, "$1$2,$3$4")

' Show Types in Message Box
' =====================================================
c1_t1a = "Column1 Types:" & Chr(13) & "type 1a full middle    - " & strC1_1a
c1_t1b = "type 1b middle initial - " & strC1_1b
c1_t2 = "type 2 middle initial - " & strC1_2
c2_t1b = "Column2 Types:" & Chr(13) & "type 1b middle initial - " & strC2_1b
c2_t2 = "type 2 middle initial - " & strC2_2

MsgBox (c1_t1a & Chr(13) & c1_t1b & Chr(13) & c1_t2 & Chr(13) & Chr(13) & c2_t1b & Chr(13) & c2_t2)

' ------------------------------------------------------
' Compare a Value from Column 1 vs Column 2
' For this we will compare Type 2 values
' ------------------------------------------------------
If strC1_2 = strC2_2 Then
   MsgBox ("Type 2 values are EQUAL: " & Chr(13) & strC1_2)
Else
   MsgBox ("Type 2 values are NOT Equal:" & Chr(13) & strC1_2 & " != " & strC1_2)
End If

' ------------------------------------------------------
' Same comparison (Type 2) of Normalized column 1,2 values
' In esscense, this is all you need
' ------------------------------------------------------
If rxC1.Replace(strC1_Normal, "$7$8,$1$3") = rxC2.Replace(strC2_Normal, "$1$2,$3$4") Then
   MsgBox ("Type 2 values are EQUAL")
Else
   MsgBox ("Type 2 values are NOT Equal")
End If

End Sub

仅第2类-

Sub RegexColumnValueComparison()

' Column 1 and 2 , Sample values
' These should probably be passed in values
' ============================================
strC1 = "John (Johnny)   Bertrand ""Abe""   Smith, Jr.  "
strC2 = "Smith Jr.,John Bertrand  "

' Normalization Regexes for whitespace's and period's
' (use for both column values)
' =============================================
Set rxDot = CreateObject("vbscript.regexp")
rxDot.Global = True
rxDot.Pattern = "\."
Set rxWSp = CreateObject("vbscript.regexp")
rxWSp.Global = True
rxWSp.Pattern = "\s+"

' Column 1 Regex
' ==================
Set rxC1 = CreateObject("vbscript.regexp")
rxC1.Global = False
rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:([ ]\([^)]*\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ]([""']).*?)\6)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$"

' Column 2 Regex
' ==================
Set rxC2 = CreateObject("vbscript.regexp")
rxC2.Global = False
rxC2.Pattern = "^[ ]?([^ ,()""']+)(?:([ ][^ ,()""']+))?,([^ ,()""']+)(?:([ ][^ ,()""'])([^ ,()""']*))?.*$"

' Normalize column 1 and 2, Copy to new var
' ============================================
strC1_Normal = rxDot.Replace(rxWSp.Replace(strC1, " "), "")
strC2_Normal = rxDot.Replace(rxWSp.Replace(strC2, " "), "")

' Comparison (Type 2) of Normalized column 1,2 values
' ============================================
strC1_2 = rxC1.Replace(strC1_Normal, "$7$8,$1$3")
strC2_2 = rxC2.Replace(strC2_Normal, "$1$2,$3$4")

If strC1_2 = strC2_2 Then
   MsgBox ("Type 2 values are EQUAL")
Else
   MsgBox ("Type 2 values are NOT Equal")
End If

End Sub

Paren/Quote响应

^
  [ ]?
  ([^\ ,()"']+)                        # (1)     first name or initial          (required)
  (?:  ([ ] \( [^)]* \))    )?         # (2)     parenthetical 'preferred' name (optional)
  (?:
       ([ ] [^\ ,()"'] )               # (3,4)   middle initial OR name         (optional)
       ([^\ ,()"']*)                   #         name and initial are both captured
  )?
  (?:  ([ ] (["'] ) .*?) \6 )?         # (5,6)   quoted nickname or initial     (optional)
  [ ]  ([^\ ,()"']+)                   # (7)     last name                      (required)
  (?:
        [, ]* ([ ].+?) [ ]?            # (8)     suffix                         (optional)
      | .*?
  )?
$

^
  [ ]?
  ([^\ ,()"']+)                  # (1)     last name                      (required)
  (?: ([ ] [^\ ,()"']+) )?       # (2)     suffix                         (optional)
  ,
  ([^\ ,()"']+)                  # (3)     first name or initial          (required)
  (?:
      ([ ] [^\ ,()"'])           # (4,5)   middle initial OR name         (optional)
      ([^\ ,()"']*)
  )?
  .*
$

作为旁注，我需要删除昵称中的引号和首选名称中的括号。

如果我理解正确的话

是的，您可以分别捕获引号和括号内的内容。
它只需要一些修改。下面的正则表达式能够
用引号和/或括号表示替换，
或其他形式

下面的示例给出了制定替换方案的方法

这里有非常重要的一点

[vba]相关文章推荐

随机文章推荐