Excel 使用VBA从文本文件提取数据时出错_Excel_Vba_Pdf

Excel 使用VBA从文本文件提取数据时出错

excel vba pdf

Excel 使用VBA从文本文件提取数据时出错,excel,vba,pdf,Excel,Vba,Pdf,我试图从一个文本文件中提取数据，该文件是从PDF导出评论的结果。事实证明，将注释导出到文本文件是最好的选择，因为XML和Excel格式的效果不太好。但是，当我尝试运行代码时，代码出现了一些错误我希望代码从如下所示的文本文件中获取数据：然后把它们放在这样一张桌子上：然而，该表是这样显示的，注释、注释编号和页面的顺序不正确，并且遗漏了一些信息。我不知道为什么这些评论看起来很复杂，但是作者D的评论被忽略了非常感谢您的帮助以下是我目前的代码： Sub Format() ' This c

我试图从一个文本文件中提取数据，该文件是从PDF导出评论的结果。事实证明，将注释导出到文本文件是最好的选择，因为XML和Excel格式的效果不太好。但是，当我尝试运行代码时，代码出现了一些错误

我希望代码从如下所示的文本文件中获取数据：

然后把它们放在这样一张桌子上：

然而，该表是这样显示的，注释、注释编号和页面的顺序不正确，并且遗漏了一些信息。我不知道为什么这些评论看起来很复杂，但是作者D的评论被忽略了

非常感谢您的帮助

以下是我目前的代码：

Sub Format()

' This code determines the users username.  Useful if there is a standard location each person would have the file on their computer.

Set scripting_object = CreateObject("Scripting.FileSystemObject")
strUser = CreateObject("WScript.Network").UserName
localfileName = "C:\Users\" + strUser + "\Downloads\ForumPostExample.txt"


' URL of the file
file_url = "C:\ForumPostExample.txt"

' This creates a scripting object
Set local_file = CreateObject("Scripting.FileSystemObject")
' Opens the text file based on the url for the file.  1, and 2 are options, like read only, can't remember exactly
Set local_file_read = local_file.OpenTextFile(file_url, 1, 2)
'Initialize worksheet
Set xlSheet = ActiveWorkbook.Worksheets("Sheet1")
'Write out the first row
xlSheet.Range("A1") = "Comment No."
xlSheet.Range("B1") = "Reviewer Name"
xlSheet.Range("C1") = "Type"
xlSheet.Range("D1") = "Page Number"
xlSheet.Range("E1") = "Comment"
xlSheet.Range("F1") = "Date Submitted"
'Set row count
row_count = 2

'Variable tells if comments needs to be written out
write_comments = "No"
'Initialize comments variable and comments count as it will need to be added to multiple times
Comments = ""
comment_count = 0

'Read each line of the file
Do Until local_file_read.AtEndOfStream
    ' Set variable textline to be the line from the text file
    textline = local_file_read.ReadLine
    'Look for Page number by checking for string "Page: " in the textline variable string
    If InStr(textline, "Page: ") > 0 Then
        'Determine if comments need to be written out
        If write_comments = "Yes" Then
            xlSheet.Range("A" & row_count) = comment_count
            xlSheet.Range("B" & row_count) = author_name
            xlSheet.Range("C" & row_count) = comment_type
            xlSheet.Range("D" & row_count) = page_number
            xlSheet.Range("E" & row_count) = Comments
            xlSheet.Range("F" & row_count) = date_variable
            row_count = row_count + 1
        End If
        'split the textline at the string "Page: " and grab the second part of the split
        page_number = Split(textline, "Page: ")(1)
        'Change write_comments variable so next time the comments get written out
        write_comments = "Yes"
    'Look for author by checkign for string "Author: "
    ElseIf InStr(textline, "Author: ") > 0 Then
        'Determine if comments need to be written out
        If write_comments = "Yes" Then
            xlSheet.Range("A" & row_count) = comment_count
            xlSheet.Range("B" & row_count) = author_name
            xlSheet.Range("C" & row_count) = comment_type
            xlSheet.Range("D" & row_count) = page_number
            xlSheet.Range("E" & row_count) = Comments
            xlSheet.Range("F" & row_count) = date_variable
            row_count = row_count + 1
        End If
        'First split the line using "Author: " as the delimiter, grab the second string
        'Then split the second string by "Subject; ", grab the first string, this isolate the author's name
        author_name = Split(Split(textline, "Author: ")(1), "Subject: ")(0)
        'Do a double split to get the type using "Subject: " and "Date: " as the delimiters
        comment_type = Split(Split(textline, "Subject: ")(1), "Date: ")(0)
        'Single split is needed to get the date
        date_variable = Split(textline, "Date: ")(1)
        comment_count = comment_count + 1
        'Change write_comments variable so next time the comments get written out
        write_comments = "Yes"
    'Determine if first line is being read and then continue to next line
    ElseIf InStr(textline, "Summary of Comments on ") > 0 Then
        'Nothin needs to happen if its the first line
    'Read in comments
    Else
        Comments = Comments + " " + textline
    End If
Loop
End Sub
enter code here

因此，您需要的是一个“状态机”，它跟踪在任何给定时间对任何给定行在文本文件中解析的内容。在大多数情况下，文本文件看起来有两个主要关键字：

Page

和

Author

。此外，您要么等待检测到下一个关键字，要么收集（多行）注释。最简单的形式是，状态机通常使用

Select Case

语句表示：

Select Case
    Case "Page"
        '--- do something with the page number
    Case "Author"
        '--- do something with the author line
    Case Else
        '--- either wait for a keyword or collect the comment
End Select

除了下面示例中的状态机之外，您还将注意到，为了简化整个逻辑长链的复杂性，我将代码逻辑分解为单独的块。通过这种方式隔离函数，可以更轻松地关注该方法正在做什么，而不必担心它对该方法其余部分的影响

Option Explicit

Sub main()
    Dim forumPostFile As String
    forumPostFile = "C:\Temp\ForumPostExample.txt"

    ExtractComments forumPostFile, Sheet1
End Sub

Sub ExtractComments(ByVal fullPathFilename As String, _
                    ByRef destWS As Worksheet)

    InitializeOutput destWS

    Dim commentNumber As Long
    commentNumber = 1

    Dim fso As FileSystemObject
    Set fso = CreateObject("Scripting.FileSystemObject")

    Dim forumFile As Object
    Set forumFile = fso.OpenTextFile(fullPathFilename, ForReading)

    Dim oneLine As String
    Dim state As String
    state = "LookForPage"

    Dim keywords() As String
    Dim page As Long
    Dim author As String
    Dim subject As String
    Dim timestamp As Date
    Dim comment As String
    Do Until forumFile.AtEndOfStream
        oneLine = forumFile.ReadLine
        If Len(oneLine) > 0 Then
            keywords = Split(oneLine, ":")
            Select Case keywords(0)
                Case "Page"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    page = keywords(1)
                    state = "LookForAuthor"

                Case "Author"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    author = Trim$(Left(keywords(1), Len(keywords(1)) - Len("Subject")))
                    subject = Trim$(Left(keywords(2), Len(keywords(2)) - Len("Date")))
                    timestamp = CDate(Right$(oneLine, Len(oneLine) - InStr(1, oneLine, "Date:") - Len("Date:")))
                    state = "BuildComment"

                Case Else
                    If state = "BuildComment" Then
                        comment = comment & oneLine
                    End If
            End Select
        End If
    Loop
    forumFile.Close
End Sub

Private Sub InitializeOutput(ByRef destWS As Worksheet)
    Dim header As Range
    Set header = destWS.Range("A1:F1")
    destWS.Cells.Clear
    With header
        .Cells(1, 1) = "Comment No."
        .Cells(1, 2) = "Reviewer Name"
        .Cells(1, 3) = "Type"
        .Cells(1, 4) = "Page Number"
        .Cells(1, 5) = "Comment"
        .Cells(1, 6) = "Date Submitted"
        .WrapText = True
        .Interior.Color = RGB(191, 191, 191)
        .Columns(1).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(2).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(3).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(4).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(5).EntireColumn.HorizontalAlignment = xlHAlignLeft
        .Columns(5).EntireColumn.WrapText = True
        .Columns(6).EntireColumn.HorizontalAlignment = xlHAlignLeft
    End With
End Sub

Private Sub CommentToSheet(ByRef destWS As Worksheet, _
                           ByVal row As Long, _
                           ByVal number As Long, _
                           ByVal author As String, _
                           ByVal subject As String, _
                           ByVal pageNumber As Long, _
                           ByVal comment As String, _
                           ByVal timestamp As Date)
    With destWS.Rows(row)
        .Cells(1, 1) = number
        .Cells(1, 2) = author
        .Cells(1, 3) = subject
        .Cells(1, 4) = pageNumber
        .Cells(1, 5) = comment
        .Cells(1, 6) = timestamp
    End With
End Sub

Page

和

Author

。此外，您要么等待检测到下一个关键字，要么收集（多行）注释。最简单的形式是，状态机通常使用

Select Case

语句表示：

Select Case
    Case "Page"
        '--- do something with the page number
    Case "Author"
        '--- do something with the author line
    Case Else
        '--- either wait for a keyword or collect the comment
End Select

Option Explicit

Sub main()
    Dim forumPostFile As String
    forumPostFile = "C:\Temp\ForumPostExample.txt"

    ExtractComments forumPostFile, Sheet1
End Sub

Sub ExtractComments(ByVal fullPathFilename As String, _
                    ByRef destWS As Worksheet)

    InitializeOutput destWS

    Dim commentNumber As Long
    commentNumber = 1

    Dim fso As FileSystemObject
    Set fso = CreateObject("Scripting.FileSystemObject")

    Dim forumFile As Object
    Set forumFile = fso.OpenTextFile(fullPathFilename, ForReading)

    Dim oneLine As String
    Dim state As String
    state = "LookForPage"

    Dim keywords() As String
    Dim page As Long
    Dim author As String
    Dim subject As String
    Dim timestamp As Date
    Dim comment As String
    Do Until forumFile.AtEndOfStream
        oneLine = forumFile.ReadLine
        If Len(oneLine) > 0 Then
            keywords = Split(oneLine, ":")
            Select Case keywords(0)
                Case "Page"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    page = keywords(1)
                    state = "LookForAuthor"

                Case "Author"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    author = Trim$(Left(keywords(1), Len(keywords(1)) - Len("Subject")))
                    subject = Trim$(Left(keywords(2), Len(keywords(2)) - Len("Date")))
                    timestamp = CDate(Right$(oneLine, Len(oneLine) - InStr(1, oneLine, "Date:") - Len("Date:")))
                    state = "BuildComment"

                Case Else
                    If state = "BuildComment" Then
                        comment = comment & oneLine
                    End If
            End Select
        End If
    Loop
    forumFile.Close
End Sub

Private Sub InitializeOutput(ByRef destWS As Worksheet)
    Dim header As Range
    Set header = destWS.Range("A1:F1")
    destWS.Cells.Clear
    With header
        .Cells(1, 1) = "Comment No."
        .Cells(1, 2) = "Reviewer Name"
        .Cells(1, 3) = "Type"
        .Cells(1, 4) = "Page Number"
        .Cells(1, 5) = "Comment"
        .Cells(1, 6) = "Date Submitted"
        .WrapText = True
        .Interior.Color = RGB(191, 191, 191)
        .Columns(1).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(2).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(3).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(4).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(5).EntireColumn.HorizontalAlignment = xlHAlignLeft
        .Columns(5).EntireColumn.WrapText = True
        .Columns(6).EntireColumn.HorizontalAlignment = xlHAlignLeft
    End With
End Sub

Private Sub CommentToSheet(ByRef destWS As Worksheet, _
                           ByVal row As Long, _
                           ByVal number As Long, _
                           ByVal author As String, _
                           ByVal subject As String, _
                           ByVal pageNumber As Long, _
                           ByVal comment As String, _
                           ByVal timestamp As Date)
    With destWS.Rows(row)
        .Cells(1, 1) = number
        .Cells(1, 2) = author
        .Cells(1, 3) = subject
        .Cells(1, 4) = pageNumber
        .Cells(1, 5) = comment
        .Cells(1, 6) = timestamp
    End With
End Sub

这是另一种方法

我确实使用了您的方法来解析数据，这确实取决于数据的排序方式，如您的文本文件屏幕截图所示（屏幕截图不是一种真正好的数据显示方法）

如果您的数据不符合要求，则需要更改编码

基本算法

创建一个对象（类），其属性是我们希望显示的元素
将每个对象（表示表行）存储在字典中
将整个文件读取到
```
页面上拆分的字符串数组中：
```
对于上述数组中的每个项目，在
```
Author
```
将作者栏拆分为作者、主题、日期和评论
- 建立注释，并处理划线的注释
将每组项单独存储为字典对象
在表格中输出并格式化结果

请务必阅读模块顶部的注释，了解有关重命名和设置引用的信息

课程模块 常规模块

“设置对Microsoft脚本运行时的引用”
选项显式
附属机构意见（）
Dim-fn作为变体
Dim dC作为字典，cC作为注释
将FSO设置为文件系统对象，将TS设置为文本流
将wsRes作为工作表，vRes作为变量，rRes作为范围
Dim str（）作为字符串，V作为变量
Dim sAuthComm（）作为字符串
暗淡的酒精
像绳子一样模糊的碎片
Dim sComments（）作为字符串
我和我一样长，我和我一样长
fn=Application.GetOpenFilename（“文本文件（*.txt），*.txt”）
如果fn=False，则退出Sub
Set FSO=新文件系统对象
设置TS=FSO.OpenTextFile（fn，ForReading，False，tristateffalse）
str=拆分（TS.ReadAll，第页：）
Set dC=新字典
"整理数据,
对于str（）中的每个V
如果Val（V）>0，则“确保我们从1或更大的页码开始
sAuthComm=Split（V，“作者：”）
sPage=修剪（sAuthComm（0））
对于I=1至UBound（sAuthComm）
设置cC=新的cComment
与cC
.Page=sPage
.Author=Trim（拆分（sAuthComm（I），“主题：”）（0））
.Subject=Trim（拆分（拆分（sAuthComm（I），“Subject:”）（1），“Date:”（0））
.DT=修剪（分割（分割（sAuthComm（I），“日期：”）（1），vbNewLine）（0））
sComments=拆分（sAuthComm（I），vbNewLine）
对于J=1至UBound（S建议）
如果sComments（J）“，.Comment=.Comment&vbLf&sComments（J）
下一个J
.Comment=Mid（.Comment，2）
"处理虚线评论"
.Comment=替换（.Comment，“-”&vbLf，“-”）
酒精浓度=酒精浓度+1
dC.Add键：=lComNum，项：=cC
以
接下来我
如果结束
下一个V
"组织输出,
重拨VRE（0到dC.计数，1到6）
'标题
vRes（0，1）=“注释编号。”
vRes（0，2）=“审核人姓名”
vRes（0,3）=“类型”
vRes（0，4）=“页码”
vRes（0,5）=“评论”
vRes（0,6）=“提交日期”
对于直流中的每个V.键
带直流（V）
vRes（V，1）=V
vRes（V，2）=作者
vRes（V，3）=受试者
vRes（V，4）=第页
vRes（V，5）=评论
vRes（V，6）=.DT
以
下一个V
出错时继续下一步
设置wsRes=ThisWorkbook.Worksheets（“结果”）
如果出错，麻木