Excel 使用VBA从文本文件提取数据时出错
我试图从一个文本文件中提取数据,该文件是从PDF导出评论的结果。事实证明,将注释导出到文本文件是最好的选择,因为XML和Excel格式的效果不太好。但是,当我尝试运行代码时,代码出现了一些错误 我希望代码从如下所示的文本文件中获取数据: 然后把它们放在这样一张桌子上: 然而,该表是这样显示的,注释、注释编号和页面的顺序不正确,并且遗漏了一些信息。我不知道为什么这些评论看起来很复杂,但是作者D的评论被忽略了 非常感谢您的帮助 以下是我目前的代码:Excel 使用VBA从文本文件提取数据时出错,excel,vba,pdf,Excel,Vba,Pdf,我试图从一个文本文件中提取数据,该文件是从PDF导出评论的结果。事实证明,将注释导出到文本文件是最好的选择,因为XML和Excel格式的效果不太好。但是,当我尝试运行代码时,代码出现了一些错误 我希望代码从如下所示的文本文件中获取数据: 然后把它们放在这样一张桌子上: 然而,该表是这样显示的,注释、注释编号和页面的顺序不正确,并且遗漏了一些信息。我不知道为什么这些评论看起来很复杂,但是作者D的评论被忽略了 非常感谢您的帮助 以下是我目前的代码: Sub Format() ' This c
Sub Format()
' This code determines the users username. Useful if there is a standard location each person would have the file on their computer.
Set scripting_object = CreateObject("Scripting.FileSystemObject")
strUser = CreateObject("WScript.Network").UserName
localfileName = "C:\Users\" + strUser + "\Downloads\ForumPostExample.txt"
' URL of the file
file_url = "C:\ForumPostExample.txt"
' This creates a scripting object
Set local_file = CreateObject("Scripting.FileSystemObject")
' Opens the text file based on the url for the file. 1, and 2 are options, like read only, can't remember exactly
Set local_file_read = local_file.OpenTextFile(file_url, 1, 2)
'Initialize worksheet
Set xlSheet = ActiveWorkbook.Worksheets("Sheet1")
'Write out the first row
xlSheet.Range("A1") = "Comment No."
xlSheet.Range("B1") = "Reviewer Name"
xlSheet.Range("C1") = "Type"
xlSheet.Range("D1") = "Page Number"
xlSheet.Range("E1") = "Comment"
xlSheet.Range("F1") = "Date Submitted"
'Set row count
row_count = 2
'Variable tells if comments needs to be written out
write_comments = "No"
'Initialize comments variable and comments count as it will need to be added to multiple times
Comments = ""
comment_count = 0
'Read each line of the file
Do Until local_file_read.AtEndOfStream
' Set variable textline to be the line from the text file
textline = local_file_read.ReadLine
'Look for Page number by checking for string "Page: " in the textline variable string
If InStr(textline, "Page: ") > 0 Then
'Determine if comments need to be written out
If write_comments = "Yes" Then
xlSheet.Range("A" & row_count) = comment_count
xlSheet.Range("B" & row_count) = author_name
xlSheet.Range("C" & row_count) = comment_type
xlSheet.Range("D" & row_count) = page_number
xlSheet.Range("E" & row_count) = Comments
xlSheet.Range("F" & row_count) = date_variable
row_count = row_count + 1
End If
'split the textline at the string "Page: " and grab the second part of the split
page_number = Split(textline, "Page: ")(1)
'Change write_comments variable so next time the comments get written out
write_comments = "Yes"
'Look for author by checkign for string "Author: "
ElseIf InStr(textline, "Author: ") > 0 Then
'Determine if comments need to be written out
If write_comments = "Yes" Then
xlSheet.Range("A" & row_count) = comment_count
xlSheet.Range("B" & row_count) = author_name
xlSheet.Range("C" & row_count) = comment_type
xlSheet.Range("D" & row_count) = page_number
xlSheet.Range("E" & row_count) = Comments
xlSheet.Range("F" & row_count) = date_variable
row_count = row_count + 1
End If
'First split the line using "Author: " as the delimiter, grab the second string
'Then split the second string by "Subject; ", grab the first string, this isolate the author's name
author_name = Split(Split(textline, "Author: ")(1), "Subject: ")(0)
'Do a double split to get the type using "Subject: " and "Date: " as the delimiters
comment_type = Split(Split(textline, "Subject: ")(1), "Date: ")(0)
'Single split is needed to get the date
date_variable = Split(textline, "Date: ")(1)
comment_count = comment_count + 1
'Change write_comments variable so next time the comments get written out
write_comments = "Yes"
'Determine if first line is being read and then continue to next line
ElseIf InStr(textline, "Summary of Comments on ") > 0 Then
'Nothin needs to happen if its the first line
'Read in comments
Else
Comments = Comments + " " + textline
End If
Loop
End Sub
enter code here
因此,您需要的是一个“状态机”,它跟踪在任何给定时间对任何给定行在文本文件中解析的内容。在大多数情况下,文本文件看起来有两个主要关键字:Page
和Author
。此外,您要么等待检测到下一个关键字,要么收集(多行)注释。最简单的形式是,状态机通常使用Select Case
语句表示:
Select Case
Case "Page"
'--- do something with the page number
Case "Author"
'--- do something with the author line
Case Else
'--- either wait for a keyword or collect the comment
End Select
除了下面示例中的状态机之外,您还将注意到,为了简化整个逻辑长链的复杂性,我将代码逻辑分解为单独的块。通过这种方式隔离函数,可以更轻松地关注该方法正在做什么,而不必担心它对该方法其余部分的影响
Option Explicit
Sub main()
Dim forumPostFile As String
forumPostFile = "C:\Temp\ForumPostExample.txt"
ExtractComments forumPostFile, Sheet1
End Sub
Sub ExtractComments(ByVal fullPathFilename As String, _
ByRef destWS As Worksheet)
InitializeOutput destWS
Dim commentNumber As Long
commentNumber = 1
Dim fso As FileSystemObject
Set fso = CreateObject("Scripting.FileSystemObject")
Dim forumFile As Object
Set forumFile = fso.OpenTextFile(fullPathFilename, ForReading)
Dim oneLine As String
Dim state As String
state = "LookForPage"
Dim keywords() As String
Dim page As Long
Dim author As String
Dim subject As String
Dim timestamp As Date
Dim comment As String
Do Until forumFile.AtEndOfStream
oneLine = forumFile.ReadLine
If Len(oneLine) > 0 Then
keywords = Split(oneLine, ":")
Select Case keywords(0)
Case "Page"
If state = "BuildComment" Then
CommentToSheet destWS, (commentNumber + 1), _
commentNumber, author, subject, page, comment, timestamp
commentNumber = commentNumber + 1
comment = vbNullString
End If
page = keywords(1)
state = "LookForAuthor"
Case "Author"
If state = "BuildComment" Then
CommentToSheet destWS, (commentNumber + 1), _
commentNumber, author, subject, page, comment, timestamp
commentNumber = commentNumber + 1
comment = vbNullString
End If
author = Trim$(Left(keywords(1), Len(keywords(1)) - Len("Subject")))
subject = Trim$(Left(keywords(2), Len(keywords(2)) - Len("Date")))
timestamp = CDate(Right$(oneLine, Len(oneLine) - InStr(1, oneLine, "Date:") - Len("Date:")))
state = "BuildComment"
Case Else
If state = "BuildComment" Then
comment = comment & oneLine
End If
End Select
End If
Loop
forumFile.Close
End Sub
Private Sub InitializeOutput(ByRef destWS As Worksheet)
Dim header As Range
Set header = destWS.Range("A1:F1")
destWS.Cells.Clear
With header
.Cells(1, 1) = "Comment No."
.Cells(1, 2) = "Reviewer Name"
.Cells(1, 3) = "Type"
.Cells(1, 4) = "Page Number"
.Cells(1, 5) = "Comment"
.Cells(1, 6) = "Date Submitted"
.WrapText = True
.Interior.Color = RGB(191, 191, 191)
.Columns(1).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(2).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(3).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(4).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(5).EntireColumn.HorizontalAlignment = xlHAlignLeft
.Columns(5).EntireColumn.WrapText = True
.Columns(6).EntireColumn.HorizontalAlignment = xlHAlignLeft
End With
End Sub
Private Sub CommentToSheet(ByRef destWS As Worksheet, _
ByVal row As Long, _
ByVal number As Long, _
ByVal author As String, _
ByVal subject As String, _
ByVal pageNumber As Long, _
ByVal comment As String, _
ByVal timestamp As Date)
With destWS.Rows(row)
.Cells(1, 1) = number
.Cells(1, 2) = author
.Cells(1, 3) = subject
.Cells(1, 4) = pageNumber
.Cells(1, 5) = comment
.Cells(1, 6) = timestamp
End With
End Sub
因此,您需要的是一个“状态机”,它跟踪在任何给定时间对任何给定行在文本文件中解析的内容。在大多数情况下,文本文件看起来有两个主要关键字:Page
和Author
。此外,您要么等待检测到下一个关键字,要么收集(多行)注释。最简单的形式是,状态机通常使用Select Case
语句表示:
Select Case
Case "Page"
'--- do something with the page number
Case "Author"
'--- do something with the author line
Case Else
'--- either wait for a keyword or collect the comment
End Select
除了下面示例中的状态机之外,您还将注意到,为了简化整个逻辑长链的复杂性,我将代码逻辑分解为单独的块。通过这种方式隔离函数,可以更轻松地关注该方法正在做什么,而不必担心它对该方法其余部分的影响
Option Explicit
Sub main()
Dim forumPostFile As String
forumPostFile = "C:\Temp\ForumPostExample.txt"
ExtractComments forumPostFile, Sheet1
End Sub
Sub ExtractComments(ByVal fullPathFilename As String, _
ByRef destWS As Worksheet)
InitializeOutput destWS
Dim commentNumber As Long
commentNumber = 1
Dim fso As FileSystemObject
Set fso = CreateObject("Scripting.FileSystemObject")
Dim forumFile As Object
Set forumFile = fso.OpenTextFile(fullPathFilename, ForReading)
Dim oneLine As String
Dim state As String
state = "LookForPage"
Dim keywords() As String
Dim page As Long
Dim author As String
Dim subject As String
Dim timestamp As Date
Dim comment As String
Do Until forumFile.AtEndOfStream
oneLine = forumFile.ReadLine
If Len(oneLine) > 0 Then
keywords = Split(oneLine, ":")
Select Case keywords(0)
Case "Page"
If state = "BuildComment" Then
CommentToSheet destWS, (commentNumber + 1), _
commentNumber, author, subject, page, comment, timestamp
commentNumber = commentNumber + 1
comment = vbNullString
End If
page = keywords(1)
state = "LookForAuthor"
Case "Author"
If state = "BuildComment" Then
CommentToSheet destWS, (commentNumber + 1), _
commentNumber, author, subject, page, comment, timestamp
commentNumber = commentNumber + 1
comment = vbNullString
End If
author = Trim$(Left(keywords(1), Len(keywords(1)) - Len("Subject")))
subject = Trim$(Left(keywords(2), Len(keywords(2)) - Len("Date")))
timestamp = CDate(Right$(oneLine, Len(oneLine) - InStr(1, oneLine, "Date:") - Len("Date:")))
state = "BuildComment"
Case Else
If state = "BuildComment" Then
comment = comment & oneLine
End If
End Select
End If
Loop
forumFile.Close
End Sub
Private Sub InitializeOutput(ByRef destWS As Worksheet)
Dim header As Range
Set header = destWS.Range("A1:F1")
destWS.Cells.Clear
With header
.Cells(1, 1) = "Comment No."
.Cells(1, 2) = "Reviewer Name"
.Cells(1, 3) = "Type"
.Cells(1, 4) = "Page Number"
.Cells(1, 5) = "Comment"
.Cells(1, 6) = "Date Submitted"
.WrapText = True
.Interior.Color = RGB(191, 191, 191)
.Columns(1).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(2).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(3).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(4).EntireColumn.HorizontalAlignment = xlHAlignCenter
.Columns(5).EntireColumn.HorizontalAlignment = xlHAlignLeft
.Columns(5).EntireColumn.WrapText = True
.Columns(6).EntireColumn.HorizontalAlignment = xlHAlignLeft
End With
End Sub
Private Sub CommentToSheet(ByRef destWS As Worksheet, _
ByVal row As Long, _
ByVal number As Long, _
ByVal author As String, _
ByVal subject As String, _
ByVal pageNumber As Long, _
ByVal comment As String, _
ByVal timestamp As Date)
With destWS.Rows(row)
.Cells(1, 1) = number
.Cells(1, 2) = author
.Cells(1, 3) = subject
.Cells(1, 4) = pageNumber
.Cells(1, 5) = comment
.Cells(1, 6) = timestamp
End With
End Sub
这是另一种方法 我确实使用了您的方法来解析数据,这确实取决于数据的排序方式,如您的文本文件屏幕截图所示(屏幕截图不是一种真正好的数据显示方法) 如果您的数据不符合要求,则需要更改编码 基本算法
- 创建一个对象(类),其属性是我们希望显示的元素
- 将每个对象(表示表行)存储在字典中
- 将整个文件读取到
页面上拆分的字符串数组中:
- 对于上述数组中的每个项目,在
Author
- 将作者栏拆分为作者、主题、日期和评论
- 建立注释,并处理划线的注释
- 将每组项单独存储为字典对象
- 在表格中输出并格式化结果
“设置对Microsoft脚本运行时的引用”
选项显式
附属机构意见()
Dim-fn作为变体
Dim dC作为字典,cC作为注释
将FSO设置为文件系统对象,将TS设置为文本流
将wsRes作为工作表,vRes作为变量,rRes作为范围
Dim str()作为字符串,V作为变量
Dim sAuthComm()作为字符串
暗淡的酒精
像绳子一样模糊的碎片
Dim sComments()作为字符串
我和我一样长,我和我一样长
fn=Application.GetOpenFilename(“文本文件(*.txt),*.txt”)
如果fn=False,则退出Sub
Set FSO=新文件系统对象
设置TS=FSO.OpenTextFile(fn,ForReading,False,tristateffalse)
str=拆分(TS.ReadAll,第页:)
Set dC=新字典
"整理数据,
对于str()中的每个V
如果Val(V)>0,则“确保我们从1或更大的页码开始
sAuthComm=Split(V,“作者:”)
sPage=修剪(sAuthComm(0))
对于I=1至UBound(sAuthComm)
设置cC=新的cComment
与cC
.Page=sPage
.Author=Trim(拆分(sAuthComm(I),“主题:”)(0))
.Subject=Trim(拆分(拆分(sAuthComm(I),“Subject:”)(1),“Date:”(0))
.DT=修剪(分割(分割(sAuthComm(I),“日期:”)(1),vbNewLine)(0))
sComments=拆分(sAuthComm(I),vbNewLine)
对于J=1至UBound(S建议)
如果sComments(J)“,.Comment=.Comment&vbLf&sComments(J)
下一个J
.Comment=Mid(.Comment,2)
"处理虚线评论"
.Comment=替换(.Comment,“-”&vbLf,“-”)
酒精浓度=酒精浓度+1
dC.Add键:=lComNum,项:=cC
以
接下来我
如果结束
下一个V
"组织输出,
重拨VRE(0到dC.计数,1到6)
'标题
vRes(0,1)=“注释编号。”
vRes(0,2)=“审核人姓名”
vRes(0,3)=“类型”
vRes(0,4)=“页码”
vRes(0,5)=“评论”
vRes(0,6)=“提交日期”
对于直流中的每个V.键
带直流(V)
vRes(V,1)=V
vRes(V,2)=作者
vRes(V,3)=受试者
vRes(V,4)=第页
vRes(V,5)=评论
vRes(V,6)=.DT
以
下一个V
出错时继续下一步
设置wsRes=ThisWorkbook.Worksheets(“结果”)
如果出错,麻木