Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/vba/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Excel 使用VBA从文本文件提取数据时出错_Excel_Vba_Pdf - Fatal编程技术网

Excel 使用VBA从文本文件提取数据时出错

Excel 使用VBA从文本文件提取数据时出错,excel,vba,pdf,Excel,Vba,Pdf,我试图从一个文本文件中提取数据,该文件是从PDF导出评论的结果。事实证明,将注释导出到文本文件是最好的选择,因为XML和Excel格式的效果不太好。但是,当我尝试运行代码时,代码出现了一些错误 我希望代码从如下所示的文本文件中获取数据: 然后把它们放在这样一张桌子上: 然而,该表是这样显示的,注释、注释编号和页面的顺序不正确,并且遗漏了一些信息。我不知道为什么这些评论看起来很复杂,但是作者D的评论被忽略了 非常感谢您的帮助 以下是我目前的代码: Sub Format() ' This c

我试图从一个文本文件中提取数据,该文件是从PDF导出评论的结果。事实证明,将注释导出到文本文件是最好的选择,因为XML和Excel格式的效果不太好。但是,当我尝试运行代码时,代码出现了一些错误

我希望代码从如下所示的文本文件中获取数据:

然后把它们放在这样一张桌子上:

然而,该表是这样显示的,注释、注释编号和页面的顺序不正确,并且遗漏了一些信息。我不知道为什么这些评论看起来很复杂,但是作者D的评论被忽略了

非常感谢您的帮助

以下是我目前的代码:

Sub Format()

' This code determines the users username.  Useful if there is a standard location each person would have the file on their computer.

Set scripting_object = CreateObject("Scripting.FileSystemObject")
strUser = CreateObject("WScript.Network").UserName
localfileName = "C:\Users\" + strUser + "\Downloads\ForumPostExample.txt"


' URL of the file
file_url = "C:\ForumPostExample.txt"

' This creates a scripting object
Set local_file = CreateObject("Scripting.FileSystemObject")
' Opens the text file based on the url for the file.  1, and 2 are options, like read only, can't remember exactly
Set local_file_read = local_file.OpenTextFile(file_url, 1, 2)
'Initialize worksheet
Set xlSheet = ActiveWorkbook.Worksheets("Sheet1")
'Write out the first row
xlSheet.Range("A1") = "Comment No."
xlSheet.Range("B1") = "Reviewer Name"
xlSheet.Range("C1") = "Type"
xlSheet.Range("D1") = "Page Number"
xlSheet.Range("E1") = "Comment"
xlSheet.Range("F1") = "Date Submitted"
'Set row count
row_count = 2

'Variable tells if comments needs to be written out
write_comments = "No"
'Initialize comments variable and comments count as it will need to be added to multiple times
Comments = ""
comment_count = 0

'Read each line of the file
Do Until local_file_read.AtEndOfStream
    ' Set variable textline to be the line from the text file
    textline = local_file_read.ReadLine
    'Look for Page number by checking for string "Page: " in the textline variable string
    If InStr(textline, "Page: ") > 0 Then
        'Determine if comments need to be written out
        If write_comments = "Yes" Then
            xlSheet.Range("A" & row_count) = comment_count
            xlSheet.Range("B" & row_count) = author_name
            xlSheet.Range("C" & row_count) = comment_type
            xlSheet.Range("D" & row_count) = page_number
            xlSheet.Range("E" & row_count) = Comments
            xlSheet.Range("F" & row_count) = date_variable
            row_count = row_count + 1
        End If
        'split the textline at the string "Page: " and grab the second part of the split
        page_number = Split(textline, "Page: ")(1)
        'Change write_comments variable so next time the comments get written out
        write_comments = "Yes"
    'Look for author by checkign for string "Author: "
    ElseIf InStr(textline, "Author: ") > 0 Then
        'Determine if comments need to be written out
        If write_comments = "Yes" Then
            xlSheet.Range("A" & row_count) = comment_count
            xlSheet.Range("B" & row_count) = author_name
            xlSheet.Range("C" & row_count) = comment_type
            xlSheet.Range("D" & row_count) = page_number
            xlSheet.Range("E" & row_count) = Comments
            xlSheet.Range("F" & row_count) = date_variable
            row_count = row_count + 1
        End If
        'First split the line using "Author: " as the delimiter, grab the second string
        'Then split the second string by "Subject; ", grab the first string, this isolate the author's name
        author_name = Split(Split(textline, "Author: ")(1), "Subject: ")(0)
        'Do a double split to get the type using "Subject: " and "Date: " as the delimiters
        comment_type = Split(Split(textline, "Subject: ")(1), "Date: ")(0)
        'Single split is needed to get the date
        date_variable = Split(textline, "Date: ")(1)
        comment_count = comment_count + 1
        'Change write_comments variable so next time the comments get written out
        write_comments = "Yes"
    'Determine if first line is being read and then continue to next line
    ElseIf InStr(textline, "Summary of Comments on ") > 0 Then
        'Nothin needs to happen if its the first line
    'Read in comments
    Else
        Comments = Comments + " " + textline
    End If
Loop
End Sub
enter code here
因此,您需要的是一个“状态机”,它跟踪在任何给定时间对任何给定行在文本文件中解析的内容。在大多数情况下,文本文件看起来有两个主要关键字:
Page
Author
。此外,您要么等待检测到下一个关键字,要么收集(多行)注释。最简单的形式是,状态机通常使用
Select Case
语句表示:

Select Case
    Case "Page"
        '--- do something with the page number
    Case "Author"
        '--- do something with the author line
    Case Else
        '--- either wait for a keyword or collect the comment
End Select
除了下面示例中的状态机之外,您还将注意到,为了简化整个逻辑长链的复杂性,我将代码逻辑分解为单独的块。通过这种方式隔离函数,可以更轻松地关注该方法正在做什么,而不必担心它对该方法其余部分的影响

Option Explicit

Sub main()
    Dim forumPostFile As String
    forumPostFile = "C:\Temp\ForumPostExample.txt"

    ExtractComments forumPostFile, Sheet1
End Sub

Sub ExtractComments(ByVal fullPathFilename As String, _
                    ByRef destWS As Worksheet)

    InitializeOutput destWS

    Dim commentNumber As Long
    commentNumber = 1

    Dim fso As FileSystemObject
    Set fso = CreateObject("Scripting.FileSystemObject")

    Dim forumFile As Object
    Set forumFile = fso.OpenTextFile(fullPathFilename, ForReading)

    Dim oneLine As String
    Dim state As String
    state = "LookForPage"

    Dim keywords() As String
    Dim page As Long
    Dim author As String
    Dim subject As String
    Dim timestamp As Date
    Dim comment As String
    Do Until forumFile.AtEndOfStream
        oneLine = forumFile.ReadLine
        If Len(oneLine) > 0 Then
            keywords = Split(oneLine, ":")
            Select Case keywords(0)
                Case "Page"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    page = keywords(1)
                    state = "LookForAuthor"

                Case "Author"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    author = Trim$(Left(keywords(1), Len(keywords(1)) - Len("Subject")))
                    subject = Trim$(Left(keywords(2), Len(keywords(2)) - Len("Date")))
                    timestamp = CDate(Right$(oneLine, Len(oneLine) - InStr(1, oneLine, "Date:") - Len("Date:")))
                    state = "BuildComment"

                Case Else
                    If state = "BuildComment" Then
                        comment = comment & oneLine
                    End If
            End Select
        End If
    Loop
    forumFile.Close
End Sub

Private Sub InitializeOutput(ByRef destWS As Worksheet)
    Dim header As Range
    Set header = destWS.Range("A1:F1")
    destWS.Cells.Clear
    With header
        .Cells(1, 1) = "Comment No."
        .Cells(1, 2) = "Reviewer Name"
        .Cells(1, 3) = "Type"
        .Cells(1, 4) = "Page Number"
        .Cells(1, 5) = "Comment"
        .Cells(1, 6) = "Date Submitted"
        .WrapText = True
        .Interior.Color = RGB(191, 191, 191)
        .Columns(1).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(2).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(3).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(4).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(5).EntireColumn.HorizontalAlignment = xlHAlignLeft
        .Columns(5).EntireColumn.WrapText = True
        .Columns(6).EntireColumn.HorizontalAlignment = xlHAlignLeft
    End With
End Sub

Private Sub CommentToSheet(ByRef destWS As Worksheet, _
                           ByVal row As Long, _
                           ByVal number As Long, _
                           ByVal author As String, _
                           ByVal subject As String, _
                           ByVal pageNumber As Long, _
                           ByVal comment As String, _
                           ByVal timestamp As Date)
    With destWS.Rows(row)
        .Cells(1, 1) = number
        .Cells(1, 2) = author
        .Cells(1, 3) = subject
        .Cells(1, 4) = pageNumber
        .Cells(1, 5) = comment
        .Cells(1, 6) = timestamp
    End With
End Sub
因此,您需要的是一个“状态机”,它跟踪在任何给定时间对任何给定行在文本文件中解析的内容。在大多数情况下,文本文件看起来有两个主要关键字:
Page
Author
。此外,您要么等待检测到下一个关键字,要么收集(多行)注释。最简单的形式是,状态机通常使用
Select Case
语句表示:

Select Case
    Case "Page"
        '--- do something with the page number
    Case "Author"
        '--- do something with the author line
    Case Else
        '--- either wait for a keyword or collect the comment
End Select
除了下面示例中的状态机之外,您还将注意到,为了简化整个逻辑长链的复杂性,我将代码逻辑分解为单独的块。通过这种方式隔离函数,可以更轻松地关注该方法正在做什么,而不必担心它对该方法其余部分的影响

Option Explicit

Sub main()
    Dim forumPostFile As String
    forumPostFile = "C:\Temp\ForumPostExample.txt"

    ExtractComments forumPostFile, Sheet1
End Sub

Sub ExtractComments(ByVal fullPathFilename As String, _
                    ByRef destWS As Worksheet)

    InitializeOutput destWS

    Dim commentNumber As Long
    commentNumber = 1

    Dim fso As FileSystemObject
    Set fso = CreateObject("Scripting.FileSystemObject")

    Dim forumFile As Object
    Set forumFile = fso.OpenTextFile(fullPathFilename, ForReading)

    Dim oneLine As String
    Dim state As String
    state = "LookForPage"

    Dim keywords() As String
    Dim page As Long
    Dim author As String
    Dim subject As String
    Dim timestamp As Date
    Dim comment As String
    Do Until forumFile.AtEndOfStream
        oneLine = forumFile.ReadLine
        If Len(oneLine) > 0 Then
            keywords = Split(oneLine, ":")
            Select Case keywords(0)
                Case "Page"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    page = keywords(1)
                    state = "LookForAuthor"

                Case "Author"
                    If state = "BuildComment" Then
                        CommentToSheet destWS, (commentNumber + 1), _
                                       commentNumber, author, subject, page, comment, timestamp
                        commentNumber = commentNumber + 1
                        comment = vbNullString
                    End If
                    author = Trim$(Left(keywords(1), Len(keywords(1)) - Len("Subject")))
                    subject = Trim$(Left(keywords(2), Len(keywords(2)) - Len("Date")))
                    timestamp = CDate(Right$(oneLine, Len(oneLine) - InStr(1, oneLine, "Date:") - Len("Date:")))
                    state = "BuildComment"

                Case Else
                    If state = "BuildComment" Then
                        comment = comment & oneLine
                    End If
            End Select
        End If
    Loop
    forumFile.Close
End Sub

Private Sub InitializeOutput(ByRef destWS As Worksheet)
    Dim header As Range
    Set header = destWS.Range("A1:F1")
    destWS.Cells.Clear
    With header
        .Cells(1, 1) = "Comment No."
        .Cells(1, 2) = "Reviewer Name"
        .Cells(1, 3) = "Type"
        .Cells(1, 4) = "Page Number"
        .Cells(1, 5) = "Comment"
        .Cells(1, 6) = "Date Submitted"
        .WrapText = True
        .Interior.Color = RGB(191, 191, 191)
        .Columns(1).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(2).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(3).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(4).EntireColumn.HorizontalAlignment = xlHAlignCenter
        .Columns(5).EntireColumn.HorizontalAlignment = xlHAlignLeft
        .Columns(5).EntireColumn.WrapText = True
        .Columns(6).EntireColumn.HorizontalAlignment = xlHAlignLeft
    End With
End Sub

Private Sub CommentToSheet(ByRef destWS As Worksheet, _
                           ByVal row As Long, _
                           ByVal number As Long, _
                           ByVal author As String, _
                           ByVal subject As String, _
                           ByVal pageNumber As Long, _
                           ByVal comment As String, _
                           ByVal timestamp As Date)
    With destWS.Rows(row)
        .Cells(1, 1) = number
        .Cells(1, 2) = author
        .Cells(1, 3) = subject
        .Cells(1, 4) = pageNumber
        .Cells(1, 5) = comment
        .Cells(1, 6) = timestamp
    End With
End Sub

这是另一种方法

我确实使用了您的方法来解析数据,这确实取决于数据的排序方式,如您的文本文件屏幕截图所示(屏幕截图不是一种真正好的数据显示方法)

如果您的数据不符合要求,则需要更改编码

基本算法
  • 创建一个对象(类),其属性是我们希望显示的元素
  • 将每个对象(表示表行)存储在字典中
  • 将整个文件读取到
    页面上拆分的字符串数组中:
  • 对于上述数组中的每个项目,在
    Author
  • 将作者栏拆分为作者、主题、日期和评论
    • 建立注释,并处理划线的注释
  • 将每组项单独存储为字典对象
  • 在表格中输出并格式化结果
请务必阅读模块顶部的注释,了解有关重命名和设置引用的信息

课程模块 常规模块
“设置对Microsoft脚本运行时的引用”
选项显式
附属机构意见()
Dim-fn作为变体
Dim dC作为字典,cC作为注释
将FSO设置为文件系统对象,将TS设置为文本流
将wsRes作为工作表,vRes作为变量,rRes作为范围
Dim str()作为字符串,V作为变量
Dim sAuthComm()作为字符串
暗淡的酒精
像绳子一样模糊的碎片
Dim sComments()作为字符串
我和我一样长,我和我一样长
fn=Application.GetOpenFilename(“文本文件(*.txt),*.txt”)
如果fn=False,则退出Sub
Set FSO=新文件系统对象
设置TS=FSO.OpenTextFile(fn,ForReading,False,tristateffalse)
str=拆分(TS.ReadAll,第页:)
Set dC=新字典
"整理数据,
对于str()中的每个V
如果Val(V)>0,则“确保我们从1或更大的页码开始
sAuthComm=Split(V,“作者:”)
sPage=修剪(sAuthComm(0))
对于I=1至UBound(sAuthComm)
设置cC=新的cComment
与cC
.Page=sPage
.Author=Trim(拆分(sAuthComm(I),“主题:”)(0))
.Subject=Trim(拆分(拆分(sAuthComm(I),“Subject:”)(1),“Date:”(0))
.DT=修剪(分割(分割(sAuthComm(I),“日期:”)(1),vbNewLine)(0))
sComments=拆分(sAuthComm(I),vbNewLine)
对于J=1至UBound(S建议)
如果sComments(J)“,.Comment=.Comment&vbLf&sComments(J)
下一个J
.Comment=Mid(.Comment,2)
"处理虚线评论"
.Comment=替换(.Comment,“-”&vbLf,“-”)
酒精浓度=酒精浓度+1
dC.Add键:=lComNum,项:=cC
以
接下来我
如果结束
下一个V
"组织输出,
重拨VRE(0到dC.计数,1到6)
'标题
vRes(0,1)=“注释编号。”
vRes(0,2)=“审核人姓名”
vRes(0,3)=“类型”
vRes(0,4)=“页码”
vRes(0,5)=“评论”
vRes(0,6)=“提交日期”
对于直流中的每个V.键
带直流(V)
vRes(V,1)=V
vRes(V,2)=作者
vRes(V,3)=受试者
vRes(V,4)=第页
vRes(V,5)=评论
vRes(V,6)=.DT
以
下一个V
出错时继续下一步
设置wsRes=ThisWorkbook.Worksheets(“结果”)
如果出错,麻木