Excel VBA Web刮除列间距问题。总行不在正确的位置
[编辑:,正确插入html代码][编辑2,固定间距问题] 间距问题已修复。我复制了代码中较高位置的这一行 我现在唯一想做的是将总计行中的数字以粗体显示,但不确定是否可以在代码中单独显示出来。如果不行,我就别管它了。我只是想在员工个人信息中脱颖而出Excel VBA Web刮除列间距问题。总行不在正确的位置,excel,vba,web-scraping,Excel,Vba,Web Scraping,[编辑:,正确插入html代码][编辑2,固定间距问题] 间距问题已修复。我复制了代码中较高位置的这一行 我现在唯一想做的是将总计行中的数字以粗体显示,但不确定是否可以在代码中单独显示出来。如果不行,我就别管它了。我只是想在员工个人信息中脱颖而出 If TD.getAttribute("colspan") Then Column = Column + TD.getAttribute("colspan") 并把它放在代码下面的这一部分 Se
If TD.getAttribute("colspan") Then
Column = Column + TD.getAttribute("colspan")
并把它放在代码下面的这一部分
Set TDs = TR.getElementsByTagName("td")
For Each TD In TDs
Worksheet.Cells(Row, Column).Value = TD.innerText
If TD.getAttribute("colspan") Then
Column = Column + TD.getAttribute("colspan")
Else
Column = Column + 1
End If
更新的代码现在是
Sub DownloadPPAProcessData0700()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Table As IHTMLElement
Dim Tables As IHTMLElementCollection
Dim Div As IHTMLElement
Dim Divs As IHTMLElementCollection
Dim H3 As IHTMLElement
Dim TR As IHTMLElement
Dim TRs As IHTMLElementCollection
Dim TD As IHTMLElement
Dim TDs As IHTMLElementCollection
Dim Row As Integer
Dim Column As Integer
Dim Worksheet As Worksheet
Dim NodeType As String
Dim Warehouse As String
Dim ProcessId As String
Dim PrimaryAttribute As String
Dim SecondaryAttribute As String
Set Worksheet = ThisWorkbook.Worksheets("0700")
Application.ScreenUpdating = True
Dim SD As Date
Dim ED As Date
Dim STS As Integer
Dim ETS As Integer
Dim STE As Integer
Dim ETE As Integer
SD = Worksheets("Variables").Range("A2").Value
ED = Worksheets("Variables").Range("A2").Value
STS = Worksheets("Variables").Range("B2").Value
ETS = Worksheets("Variables").Range("C2").Value
STE = Worksheets("Variables").Range("D2").Value
ETE = Worksheets("Variables").Range("D2").Value
Application.ScreenUpdating = True
NodeType = Worksheet.Cells(2, 1).Value
Warehouse = Worksheet.Cells(2, 2).Value
ProcessId = Worksheet.Cells(2, 3).Value
PrimaryAttribute = Worksheet.Cells(2, 4).Value
SecondaryAttribute = Worksheet.Cells(2, 5).Value
Row = 1
Column = 1
Sheets("0700").Activate
Sheets("0700").Select
Application.ScreenUpdating = True
With ActiveSheet
Set Browser = New InternetExplorerMedium
Browser.Navigate "https://fclm-portal.amazon.com/ppa/inspect/process?&processId=100114&warehouseId=BFI4&primaryAttribute=PICKING_PROCESS_PATH&secondaryAttribute=GL_CODE&maxIntradayDays=1&spanType=Intraday&startDateIntraday=" & Year(SD) & "%2F" & Month(SD) & "%2F" & Day(SD) & "&startHourIntraday=" & (STS) & "&startMinuteIntraday=" & (STE) & "&endDateIntraday=" & Year(ED) & "%2F" & Month(ED) & "%2F" & Day(ED) & "&endHourIntraday=" & (ETS) & "&endMinuteIntraday=" & (ETE)
'Wait for page to load
Do While Browser.Busy Or Browser.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
'Scan the document
Set Document = Browser.Document
Set Divs = Document.getElementById("secondaryProductivityList").getElementsByTagName("div")
For Each Div In Divs
Set H3 = Div.getElementsByTagName("h3")(0)
If Not Div.className = "floatHeader" And Not H3 Is Nothing Then
Worksheet.Cells(Row, 1).Value = H3.innerText
Worksheet.Cells(Row, 1).Font.Bold = True
Row = Row + 1
Set Tables = Div.getElementsByTagName("table")
Set Table = Tables(0)
Set TRs = Table.getElementsByTagName("tr")
For Each TR In TRs
Column = 1
Set TDs = TR.getElementsByTagName("th")
For Each TD In TDs
Worksheet.Cells(Row, Column).Value = TD.innerText
Worksheet.Cells(Row, Column).Font.Bold = True
If TD.getAttribute("colspan") Then
Column = Column + TD.getAttribute("colspan")
Else
Column = Column + 1
End If
Next
Set TDs = TR.getElementsByTagName("td")
For Each TD In TDs
Worksheet.Cells(Row, Column).Value = TD.innerText
If TD.getAttribute("colspan") Then
Column = Column + TD.getAttribute("colspan")
Else
Column = Column + 1
End If
Next
Row = Row + 1
Next
End If
Row = Row + 1
Next
Browser.Quit
Application.ScreenUpdating = True
End With
Range("A1:Z50").Columns.AutoFit
End Sub
我正试图在一个网页刮文件的工作,网页的变化取决于一天中的时间,你想得到的信息。但现在,我只想发布1小时的代码是如何设置的
这确实是从网页上复制了正确的表格,但我遇到的问题是,在每个表格的末尾应该有一个总行,它应该与其他数字的间距相匹配。网页上的“总计”一词占据了3列,空格,其余数字将紧跟其后,并与“单位”列对齐
当宏提取数据时,它将总行放在最左边的单元格中,数字数据放在最右边。使整条线路断开两个单元
包括两个链接,指向网页外观和excel文件显示内容的图像
这是密码
Sub DownloadPPAProcessData0700()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Table As IHTMLElement
Dim Tables As IHTMLElementCollection
Dim Div As IHTMLElement
Dim Divs As IHTMLElementCollection
Dim H3 As IHTMLElement
Dim TR As IHTMLElement
Dim TRs As IHTMLElementCollection
Dim TD As IHTMLElement
Dim TDs As IHTMLElementCollection
Dim Row As Integer
Dim Column As Integer
Dim Worksheet As Worksheet
Set Worksheet = ThisWorkbook.Worksheets("0700")
Application.ScreenUpdating = False
Dim SD As Date
Dim ED As Date
Dim STS As Integer
Dim ETS As Integer
Dim STE As Integer
Dim ETE As Integer
SD = Worksheets("Variables").Range("A2").Value
ED = Worksheets("Variables").Range("A2").Value
STS = Worksheets("Variables").Range("B2").Value
ETS = Worksheets("Variables").Range("C2").Value
STE = Worksheets("Variables").Range("D2").Value
ETE = Worksheets("Variables").Range("D2").Value
Application.ScreenUpdating = False
Row = 1
Column = 1
Sheets("0700").Activate
Sheets("0700").Select
With ActiveSheet
Set Browser = New InternetExplorerMedium
Browser.Navigate "https://fclm.com/ppa/inspect/process?&processId=100114&warehouseId=...&primaryAttribute=PICKING_PROCESS_PATH&secondaryAttribute=GL_CODE&maxIntradayDays=1&spanType=Intraday&startDateIntraday=" & Year(SD) & "%2F" & Month(SD) & "%2F" & Day(SD) & "&startHourIntraday=" & (STS) & "&startMinuteIntraday=" & (STE) & "&endDateIntraday=" & Year(ED) & "%2F" & Month(ED) & "%2F" & Day(ED) & "&endHourIntraday=" & (ETS) & "&endMinuteIntraday=" & (ETE)
'Wait for page to load
Do While Browser.Busy Or Browser.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
'Scan the document
Set Document = Browser.Document
Set Divs = Document.getElementById("secondaryProductivityList").getElementsByTagName("div")
For Each Div In Divs
Set H3 = Div.getElementsByTagName("h3")(0)
If Not Div.className = "floatHeader" And Not H3 Is Nothing Then
Worksheet.Cells(Row, 1).Value = H3.innerText
Row = Row + 1
Set Tables = Div.getElementsByTagName("table")
Set Table = Tables(0)
Set TRs = Table.getElementsByTagName("tr")
For Each TR In TRs
Column = 1
Set TDs = TR.getElementsByTagName("th")
For Each TD In TDs
Worksheet.Cells(Row, Column).Value = TD.innerText
Worksheet.Cells(Row, Column).Font.Bold = True
If TD.getAttribute("colspan") Then
Column = Column + TD.getAttribute("colspan")
Else
Column = Column + 1
End If
Next
Set TDs = TR.getElementsByTagName("td")
For Each TD In TDs
Worksheet.Cells(Row, Column).Value = TD.innerText
Column = Column + 1
Next
Row = Row + 1
Next
End If
Row = Row + 1
Next
Browser.Quit
End With
Range("A1:Z50").Columns.AutoFit
End Sub
这是我试图从中复制的整个底部表的检查代码
拾取过程路径:PPFRACSDESTROY
未知的
员工Id
员工姓名
经理姓名
单位
量
直达时间
推断的小时数
总小时数
UPH
QPH
100244269
拉勒纳斯,布兰登
迈克尔·杰克逊
503
503
0.92
0
0.92
545.75
545.75
101157611
斯瓦林根,凯特琳
迈克尔·杰克逊
539
539
0.99
0
0.99
543.38
543.38
总数:
1,042
1,042
1.91
0
1.91
544.52
544.52
未知的
员工Id
员工姓名
经理姓名
单位
量
直达时间
推断的小时数
总小时数
UPH
QPH
拾取过程路径:PPFRACSDESTROYHZMT
未知的
员工Id
员工姓名
经理姓名
单位
量
直达时间
推断的小时数
总小时数
UPH
QPH
100244269
拉勒纳斯,布兰登
迈克尔·杰克逊
22
22
0.08
0
0.08
280.85
280.85
101157611
斯瓦林根,凯特琳
迈克尔·杰克逊
5.
5.
0.01
0
0.01
620.69
620.69
总数:
27
27
0.09
0
0.09
312.54
312.54
未知的
员工Id
员工姓名
经理姓名
单位
量
直达时间
推断的小时数
总小时数
UPH
QPH
拾取过程路径:PPFRACSLTL
未知的
员工Id
员工姓名
经理姓名
单位
量
直达时间
推断的小时数
总小时数
UPH
QPH
101158899
罗娜,西马弗兰卡
迈克尔·杰克逊
1.
57
0.80
0.20
1.
1.
57
总数:
1.
57
0.80
0.20
1.
1.
57
未知的
员工Id
员工姓名
经理姓名
单位
量
直达时间
推断的小时数
总小时数
UPH
QPH
我会使用剪贴板来复制整个格式。我现在正在从编辑的文件中读取html,而您将从ie.document中提取数据-大纲如下所示
Option Explicit
Public Sub test()
Dim html As HTMLDocument, tables As Object, i As Long, clipboard As Object
Dim ws As Worksheet, tablesDescriptions As Object
Dim ie As InternetExplorer, url As String
url = "yourURL"
Set ie = New InternetExplorer 'InternetExplorerMedium
Set ws = ThisWorkbook.Worksheets("Sheet1")
ie.Visible = True
ie.Navigate2 url
While ie.Busy Or ie.readyState < 4: DoEvents: Wend
Set html = ie.document
Set tables = html.querySelectorAll("#secondaryProductivityList table")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
Set tablesDescriptions = html.querySelectorAll("#secondaryProductivityList h3")
ws.Cells.UnMerge
For i = 0 To tables.Length - 1 Step 2
clipboard.SetText tables.item(i).outerHTML
clipboard.PutInClipboard
ws.Cells(LastRow(ws) + 2, 1) = tablesDescriptions.item(IIf(i = 0, 0, i / 2)).innerText
ws.Cells(LastRow(ws) + 1, 1).PasteSpecial
Next
ie.Quit
End Sub
Public Function LastRow(ByVal sh As Worksheet) As Long
On Error Resume Next
LastRow = sh.Cells.Find(What:="*", _
After:=sh.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Row
On Error GoTo 0
End Function
输出:
使用调试器并逐步完成代码。当你获取页面的全部部分时,看看发生了什么。像往常一样,我不能查看imgur图像-只有黑屏。你有我们可以测试的实际URL吗?您可以通过插入HTML使用代码片段工具。我想发布url,但遗憾的是,在我们的工作网络之外,链接无法连接。这是导致这一问题更具挑战性的一个因素。您是否尝试了下面我的答案?对不起,不确定如何使用您的答案/代码并将其实现到我的文件中。我对excel还是相当陌生,这段代码是在其他人创建的另一个文件中发现的。我刚刚对url链接进行了一些调整,以使用参数。不过,我确实以不同的方式解决了间距问题。我将把它添加到我的问题编辑中