用于刮取数据的VBA脚本不';行不通
我编写了一个简短的VBA脚本,它生成URL并下载页面内容,然后放入新的工作表。但是,数据始终显示在两个页面上,产生以下类型的URL: 对于结果的第一页:用于刮取数据的VBA脚本不';行不通,vba,excel,web-scraping,web-crawler,Vba,Excel,Web Scraping,Web Crawler,我编写了一个简短的VBA脚本,它生成URL并下载页面内容,然后放入新的工作表。但是,数据始终显示在两个页面上,产生以下类型的URL: 对于结果的第一页: resultat_annuaire.php?loc=01&item=hopital&session=clear (with 01 being the region) 第二页: resultat_annuaire.php?loc=01&item=hopital&page=2 (session=clear
resultat_annuaire.php?loc=01&item=hopital&session=clear (with 01 being the region)
第二页:
resultat_annuaire.php?loc=01&item=hopital&page=2 (session=clear is gone, replaced by page=2)
当我的VBA脚本生成并刮取第一页的URL时,它工作正常(即,我将95个不同的页面下载到我的excel)
但是,当我运行相同的VBA脚本时(仅更改生成URL以获取第二页的方式),它下载的内容是第一个URL第2页内容的95倍
现在,我尝试通过执行以下操作,在我的web浏览器中简单地修改URL:
输入第二页URL:
resultat_annuaire.php?loc=01&item=hopital&page=2
然后将01更改为05,如下所示:
resultat_annuaire.php?loc=05&item=hopital&page=2
同样,没有发生任何事情,页面保持不变,即好像我没有将01切换到05
以下是VBA脚本:
Sub Data_scraping()
For x = 1 To 9
ActiveWorkbook.Worksheets.Add
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" _
& "http://etablissements.hopital.fr/resultat_annuaire.php?loc=" _
& "0" _
& x _
& "&item=hopital&session=clear" _
, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = "resultat_annuaire.php?loc=01&item=hopital&session=clear"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=18
Rows("1:31").Select
Selection.Delete Shift:=xlUp
Range("A5").Select
Next x
End Sub
有人能提供解释或帮助吗?您的宏在page=2上运行得非常好
Sub sof20287920Data_scrapping()
Dim x, strLoc, strUrl
Dim wkb
Set wkb = Workbooks.Add()
wkb.Activate
For x = 1 To 9
ActiveWorkbook.Worksheets.Add After:=ActiveWorkbook.Sheets(ActiveWorkbook.Sheets.Count)
strLoc = "resultat_annuaire.php?loc=" & "0" & x & "&item=hopital&session=clear&page=2"
strUrl = "http://etablissements.hopital.fr/" & strLoc
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" & strUrl _
, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = strLoc
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=18
Rows("1:31").Select
Selection.Delete Shift:=xlUp
Range("A5").Select
Next x
End Sub
即使对于page=2,也需要session=clear,如下所示:
http://etablissements.hopital.fr/resultat_annuaire.php?loc=01&item=hopital&session=clear&page=2
对于page=2,您的宏运行得非常好
Sub sof20287920Data_scrapping()
Dim x, strLoc, strUrl
Dim wkb
Set wkb = Workbooks.Add()
wkb.Activate
For x = 1 To 9
ActiveWorkbook.Worksheets.Add After:=ActiveWorkbook.Sheets(ActiveWorkbook.Sheets.Count)
strLoc = "resultat_annuaire.php?loc=" & "0" & x & "&item=hopital&session=clear&page=2"
strUrl = "http://etablissements.hopital.fr/" & strLoc
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" & strUrl _
, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = strLoc
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=18
Rows("1:31").Select
Selection.Delete Shift:=xlUp
Range("A5").Select
Next x
End Sub
即使对于page=2,也需要session=clear,如下所示:
http://etablissements.hopital.fr/resultat_annuaire.php?loc=01&item=hopital&session=clear&page=2