Regex 二进制文件中的正则表达式搜索
我尝试编写一个Excel VBA脚本,从二进制FrameMaker文件(*.fm)中获取一些信息(版本和修订日期) 下面的子文件打开*.fm文件,并将前25行(所需信息写入前25行)写入变量Regex 二进制文件中的正则表达式搜索,regex,excel,vba,framemaker,Regex,Excel,Vba,Framemaker,我尝试编写一个Excel VBA脚本,从二进制FrameMaker文件(*.fm)中获取一些信息(版本和修订日期) 下面的子文件打开*.fm文件,并将前25行(所需信息写入前25行)写入变量 Sub fetchDate() Dim fso As Object Dim fmFile As Object Dim fileString As String Dim fileName As String Dim matchPattern As String
Sub fetchDate()
Dim fso As Object
Dim fmFile As Object
Dim fileString As String
Dim fileName As String
Dim matchPattern As String
Dim result As String
Dim i As Integer
Dim bufferString As String
Set fso = CreateObject("Scripting.FileSystemObject")
fileName = "C:\FrameMaker-file.fm"
Set fmFile = fso.OpenTextFile(fileName, ForReading, False, TristateFalse)
matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
fileString = ""
i = 1
Do While i <= 25
bufferString = fmFile.ReadLine
fileString = fileString & bufferString & vbNewLine
i = i + 1
Loop
fmFile.Close
'fileString = Replace(fileString, matchPattern, "")
result = regExSearch(fileString, matchPattern)
MsgBox result
Set fso = Nothing
Set fmFile = Nothing
End Sub
Sub-fetchDate()
作为对象的Dim fso
将文件作为对象
将文件字符串设置为字符串
将文件名设置为字符串
将模式设置为字符串
将结果变暗为字符串
作为整数的Dim i
将缓冲字符串设置为字符串
设置fso=CreateObject(“Scripting.FileSystemObject”)
fileName=“C:\FrameMaker file.fm”
设置fmFile=fso.OpenTextFile(文件名、ForReading、False、三态False)
matchPattern=“Version-Date.+?(\d{1,2})[\s\s]*Rev.+?(\d{1,2})”
fileString=“”
i=1
我找到了使用ADODB.streams的解决方案。这很好:
Sub test_binary()
Dim regEx As Object
Dim buffer As String
Dim filename As String
Dim matchPattern As String
Dim result As String
Set regEx = CreateObject("VBScript.RegExp")
filename = "C:\test.fm"
With CreateObject("ADODB.Stream")
.Open
.Type = 2
.Charset = "utf-8"
.LoadFromFile filename
buffer = .Readtext(10000)
.Close
End With
matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
result = regExSearch(buffer, matchPattern)
MsgBox result
End Sub
正则表达式函数:
Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
Dim regEx As New RegExp
Dim result As String
Dim match As Variant
Dim matches As Variant
Dim subMatch As Variant
Set regEx = CreateObject("VBScript.RegExp")
If strPattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
result = ""
For Each match In matches
If match.SubMatches.Count > 0 Then
For Each subMatch In match.SubMatches
If Len(result) > 0 Then
result = result & "||"
End If
result = result & subMatch
Next subMatch
End If
Next match
regExSearch = result
Else
regExSearch = "err_nomatch"
End If
End If
Set regEx = Nothing
End Function
函数regExSearch(ByVal strInput作为字符串,ByVal strPattern作为字符串)作为字符串
Dim regEx作为新的RegExp
将结果变暗为字符串
作为变体的暗淡匹配
作为变体的Dim匹配
变暗子匹配
设置regEx=CreateObject(“VBScript.RegExp”)
如果strPattern“”则
用正则表达式
.Global=True
.MultiLine=True
.IgnoreCase=False
.Pattern=strPattern
以
如果正则表达式测试(strInput),则
Set matches=regEx.Execute(strInput)
result=“”
比赛中的每一场比赛
如果match.SubMatches.Count>0,则
对于match.subMatch中的每个子匹配
如果Len(结果)>0,则
结果=结果&“| |”
如果结束
结果=结果和子匹配
下一次比赛
如果结束
下一场比赛
regExSearch=result
其他的
regExSearch=“err\u nomatch”
如果结束
如果结束
Set regEx=Nothing
端函数
将*.fm文件作为文本文件(.Type=2)打开并将字符集设置为“utf-8”非常重要。否则,我的正则表达式将无法读取纯文本
非常感谢你让我走上正轨 只需将FM文件另存为MIF即可。
它是FM文件的文本编码,可以在不丢失信息的情况下来回转换。FSO和二进制文件不是朋友。使用不同的技术,比如ADODB。Streams@Sam这让我咯咯地笑了起来:-)你能上传一个样本数据(.fm
)文件吗?我们可以用它来重现你的问题。我怀疑有更好的编码方法。特别是,正如@Sam所暗示的,FSO适用于ASCII文本文件;对于其他类型的文件不太好,可能不是最好的选择。对于一个示例.fm文件,以及在您的屏幕截图中,我没有看到您的正则表达式正在搜索的字符串版本或版本(可能除了您所说的插入的正则表达式)。那么,你说的这些术语到底是什么意思?谢谢你的回答!我可以在周一回来工作的时候给你提供更多的信息和一个样本文件。
<MakerFile 12.0>
Aaÿ ` ? ???? /tEXt ? c ? E ? ????a A ? ? ? ? ? d??????? ? Heading ????????????A???????A
Sub test_binary()
Dim regEx As Object
Dim buffer As String
Dim filename As String
Dim matchPattern As String
Dim result As String
Set regEx = CreateObject("VBScript.RegExp")
filename = "C:\test.fm"
With CreateObject("ADODB.Stream")
.Open
.Type = 2
.Charset = "utf-8"
.LoadFromFile filename
buffer = .Readtext(10000)
.Close
End With
matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
result = regExSearch(buffer, matchPattern)
MsgBox result
End Sub
Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
Dim regEx As New RegExp
Dim result As String
Dim match As Variant
Dim matches As Variant
Dim subMatch As Variant
Set regEx = CreateObject("VBScript.RegExp")
If strPattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
result = ""
For Each match In matches
If match.SubMatches.Count > 0 Then
For Each subMatch In match.SubMatches
If Len(result) > 0 Then
result = result & "||"
End If
result = result & subMatch
Next subMatch
End If
Next match
regExSearch = result
Else
regExSearch = "err_nomatch"
End If
End If
Set regEx = Nothing
End Function