Excel 修复使用VBA读取UTF-8编码的CSV时出现的BOM问题（ï；»；¿；）_Excel_Vba_Adodb_Vba7_Vba6

Excel 修复使用VBA读取UTF-8编码的CSV时出现的BOM问题（ï；»；¿；）

excel vba

Excel 修复使用VBA读取UTF-8编码的CSV时出现的BOM问题（ï；»；¿；）,excel,vba,adodb,vba7,vba6,Excel,Vba,Adodb,Vba7,Vba6,在尝试用VBA（Excel）读取UTF-8编码的CSV时，字节顺序标记（ï»或十六进制的EF BB BF）引起了一个臭名昭著的问题，我想得到一些新的建议。请注意，我希望避免使用Workbook.Open或FileSystemObject打开CSV。实际上，我宁愿使用adodb.RecordSet，因为我需要执行某种SQL查询在阅读了很多（很多！）内容后，我认为解决这一具体问题的4个最佳解决方案是：在使用ADODB.Connection/ADODB.RecordSet读取CSV之前删除BOM

在尝试用VBA（Excel）读取UTF-8编码的CSV时，字节顺序标记（ï»或十六进制的EF BB BF）引起了一个臭名昭著的问题，我想得到一些新的建议。请注意，我希望避免使用Workbook.Open或FileSystemObject打开CSV。实际上，我宁愿使用adodb.RecordSet，因为我需要执行某种SQL查询

在阅读了很多（很多！）内容后，我认为解决这一具体问题的4个最佳解决方案是：

在使用ADODB.Connection/ADODB.RecordSet读取CSV之前删除BOM（例如，通过#iFile或Scripting.FileSystemObject-OpenAsTextStream高效读取文件的第一行并删除BOM）
创建schema.ini文件，以便ADO正确解析CSV
使用向导创建的一些模块（如）
使用ADODB.Stream并设置Charset=“UTF-8”

最后一个解决方案（使用流）似乎很好，但执行以下操作将返回一个字符串：

Sub loadCsv()

    Const adModeReadWrite As Integer = 3

    With CreateObject("ADODB.Stream")
        .Charset = "utf-8"
        .Mode = adModeReadWrite
        .Open
        .LoadFromFile ("C:\atestpath\test.csv")
        Debug.Print .readtext
    End With
 
End Sub

您知道使用.readtext返回的字符串作为ADODB.RecordSet或ADODB.Connection的数据源（除了循环手动填充我的记录集的字段）有什么帮助吗？

，无论是在连接字符串还是Schema.ini中，您都无法真正摆脱第一个字段前面的

？

如果在Schema.ini中指定所有列，就可以去掉它；但这仍然需要为每个文件创建Schema.ini。您必须预先知道字段名，无论是因为它们总是相同的，还是通过读取字段名（在这里以圆圈运行）
看起来任何解决方案都会让您预处理文件
所以问题是，这真的重要吗。。。不，看起来不是这样的

事实上，即使第一个字段名前面有一个

？

，看起来它实际上并不重要

Sub ReadCSVasRecordSet()
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1
Dim FilePath As String, Filename As String
Dim Conn As ADODB.Connection
Dim RS As ADODB.Recordset
    FilePath = "C:\temp"
    Set Conn = New ADODB.Connection
    'Conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & FilePath & ";Extended Properties=""text;CharacterSet=utf-8;HDR=YES;FMT=Delimited"""
    Conn.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FilePath & ";Extended Properties=""text;HDR=YES;FMT=Delimited"""
    Filename = "CN43N-Projects.csv"
    Set RS = New ADODB.Recordset
    RS.Open "SELECT * FROM [" & Filename & "] WHERE [Status] = ""REL"" AND [Lev] = 1", Conn, adOpenStatic, adLockOptimistic, adCmdText
    'Checking the first field name
    Debug.Print RS.Fields(0).Name       ' Outputs: ?Lev
    Debug.Print RS.Fields("Lev").Name   ' Outputs: ?Lev
    'Debug.Print RS.Fields("?Lev").Name ' Errors out if I include ?
    Do Until RS.EOF
        Debug.Print RS.Fields.Item("Lev"),
        Debug.Print RS.Fields.Item("Proj# def#"),
        Debug.Print RS.Fields.Item("Name"),
        Debug.Print RS.Fields.Item("Status")
        RS.MoveNext
    Loop
    Set RS = Nothing
    If Not Conn Is Nothing Then
        Conn.Close
        Set Conn = Nothing
    End If
End Sub

编辑1-什么？？

有趣的是，如果要清除字段名，就不能直接将第一个字符与“？”匹配，因为它仍然是UTF-8。您可以检查ASCII码值

Asc（左（字段（0）.名称，1））=Asc（“？”）

或者最好使用

AscW

。您会注意到，当您使用UTF-8格式时，您将得到

AscW（左（字段（0）.名称，1））=-257

（非

）

因此，进一步研究，即使您在连接字符串或Schema.ini中指定
CharacterSet=65001
，您也无法真正摆脱第一个字段前面的
？
如果在Schema.ini中指定所有列，就可以去掉它；但这仍然需要为每个文件创建Schema.ini。您必须预先知道字段名，无论是因为它们总是相同的，还是通过读取字段名（在这里以圆圈运行）
看起来任何解决方案都会让您预处理文件
所以问题是，这真的重要吗。。。不，看起来不是这样的
事实上，即使第一个字段名前面有一个
？
，看起来它实际上并不重要

Sub ReadCSVasRecordSet() Const adOpenStatic = 3 Const adLockOptimistic = 3 Const adCmdText = &H1 Dim FilePath As String, Filename As String Dim Conn As ADODB.Connection Dim RS As ADODB.Recordset FilePath = "C:\temp" Set Conn = New ADODB.Connection 'Conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & FilePath & ";Extended Properties=""text;CharacterSet=utf-8;HDR=YES;FMT=Delimited""" Conn.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & FilePath & ";Extended Properties=""text;HDR=YES;FMT=Delimited""" Filename = "CN43N-Projects.csv" Set RS = New ADODB.Recordset RS.Open "SELECT * FROM [" & Filename & "] WHERE [Status] = ""REL"" AND [Lev] = 1", Conn, adOpenStatic, adLockOptimistic, adCmdText 'Checking the first field name Debug.Print RS.Fields(0).Name ' Outputs: ?Lev Debug.Print RS.Fields("Lev").Name ' Outputs: ?Lev 'Debug.Print RS.Fields("?Lev").Name ' Errors out if I include ? Do Until RS.EOF Debug.Print RS.Fields.Item("Lev"), Debug.Print RS.Fields.Item("Proj# def#"), Debug.Print RS.Fields.Item("Name"), Debug.Print RS.Fields.Item("Status") RS.MoveNext Loop Set RS = Nothing If Not Conn Is Nothing Then Conn.Close Set Conn = Nothing End If End Sub
编辑1-什么？？
有趣的是，如果要清除字段名，就不能直接将第一个字符与“？”匹配，因为它仍然是UTF-8。您可以检查ASCII码值

Asc（左（字段（0）.名称，1））=Asc（“？”）
或者最好使用
AscW
。您会注意到，当您使用UTF-8格式时，您将得到

AscW（左（字段（0）.名称，1））=-257
（非
63
）

编辑：我发现，使用查询表对象（见此）或通过对象（Excel 2016中介绍）加载CSV是最简单、可能也是最可靠的方法（见文档中的示例）
旧答案：
与@Profex交谈鼓励我进一步调查这个问题。结果有两个问题：BOM和用于CSV的分隔符。我需要使用的ADO连接字符串是：

strCon = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Users\test\;Extended Properties='text;HDR=YES;CharacterSet=65001;FMT=Delimited(;)'"
但是FMT不能使用分号（
FMT=Delimited（；）
），至少不能在x64系统（Excel x64）上使用Microsoft.ACE.OLEDB.12.0。因此，@Profex说得很对：
即使第一个字段名有一个？在它前面，它没有看起来这很重要
假设他在一个由简单逗号（“，”）分隔的CSV上使用
FMT=Delimited
有些人建议编辑注册表，以便接受分号分隔符。我想避免这样。此外，我不希望创建schema.ini文件（即使这可能是复杂CSV的最佳解决方案）。因此，剩下的唯一解决方案需要在创建ADODB.Connection之前编辑CSV
我知道我的CSV总是会有问题的BOM以及相同的基本结构（比如“日期”、“计数”）。因此，我决定使用以下代码：

Dim arrByte() As Byte Dim strFilename As String Dim iFile As Integer Dim strBuffer As String strFilename = "C:\Users\test\t1.csv" If Dir(strFilename) <> "" Then 'check if the file exists, because if not, it would be created when it is opened for Binary mode. iFile = FreeFile Open strFilename For Binary Access Read Write As #iFile strBuffer = String(3, " ") 'We know the BOM has a length of 3 Get #iFile, , strBuffer If strBuffer = "ï»¿" 'Check if the BOM is there strBuffer = String(LOF(iFile) - 3, " ") Get #iFile, , strBuffer 'the current read position is ok because we already used a Get. We store the whole content of the file without the BOM in strBuffer arrByte = Replace(strBuffer, ";", ",") 'We replace every semicolon by a colon Put #iFile, 1, arrByte End If Close #iFile End If

Dim arrByte（）作为字节将strFilename设置为字符串 Dim iFile为整数作为字符串的Dim strBuffer strFilename=“C:\Users\test\t1.csv” 如果Dir（strFilename）“，”则“检查文件是否存在，因为如果不存在，则在以二进制模式打开文件时将创建该文件。”。 iFile=FreeFile 打开二进制访问的strFilename读写为#i文件 strBuffer=String（3，“”）我们知道BOM的长度为3 拿着我的东西，史崔佛如果strBuffer=“ï»”？“请检查BOM是否存在 strBuffer=String（LOF（iFile）-3，“”）获取#iFile，strBuffer'当前读取位置