用于刮取数据的VBA脚本不';行不通

用于刮取数据的VBA脚本不';行不通,vba,excel,web-scraping,web-crawler,Vba,Excel,Web Scraping,Web Crawler,我编写了一个简短的VBA脚本,它生成URL并下载页面内容,然后放入新的工作表。但是,数据始终显示在两个页面上,产生以下类型的URL: 对于结果的第一页: resultat_annuaire.php?loc=01&item=hopital&session=clear (with 01 being the region) 第二页: resultat_annuaire.php?loc=01&item=hopital&page=2 (session=clear

我编写了一个简短的VBA脚本,它生成URL并下载页面内容,然后放入新的工作表。但是,数据始终显示在两个页面上,产生以下类型的URL:

对于结果的第一页:

resultat_annuaire.php?loc=01&item=hopital&session=clear   (with 01 being the region) 
第二页:

resultat_annuaire.php?loc=01&item=hopital&page=2   (session=clear is gone, replaced by page=2) 
当我的VBA脚本生成并刮取第一页的URL时,它工作正常(即,我将95个不同的页面下载到我的excel)

但是,当我运行相同的VBA脚本时(仅更改生成URL以获取第二页的方式),它下载的内容是第一个URL第2页内容的95倍

现在,我尝试通过执行以下操作,在我的web浏览器中简单地修改URL:

输入第二页URL:

resultat_annuaire.php?loc=01&item=hopital&page=2
然后将01更改为05,如下所示:

resultat_annuaire.php?loc=05&item=hopital&page=2
同样,没有发生任何事情,页面保持不变,即好像我没有将01切换到05

以下是VBA脚本:

Sub Data_scraping()
    For x = 1 To 9
        ActiveWorkbook.Worksheets.Add
        With ActiveSheet.QueryTables.Add(Connection:= _
            "URL;" _
            & "http://etablissements.hopital.fr/resultat_annuaire.php?loc=" _
            & "0" _
            & x _
            & "&item=hopital&session=clear" _
            , Destination:=Range("$A$1"))


        '.CommandType = 0


        .Name = "resultat_annuaire.php?loc=01&item=hopital&session=clear"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    ActiveWindow.SmallScroll Down:=18
    Rows("1:31").Select
    Selection.Delete Shift:=xlUp
    Range("A5").Select
Next x
End Sub

有人能提供解释或帮助吗?

您的宏在page=2上运行得非常好

Sub sof20287920Data_scrapping()
  Dim x, strLoc, strUrl
  Dim wkb

  Set wkb = Workbooks.Add()
  wkb.Activate

  For x = 1 To 9
    ActiveWorkbook.Worksheets.Add After:=ActiveWorkbook.Sheets(ActiveWorkbook.Sheets.Count)
    strLoc = "resultat_annuaire.php?loc=" & "0" & x & "&item=hopital&session=clear&page=2"
    strUrl = "http://etablissements.hopital.fr/" & strLoc
    With ActiveSheet.QueryTables.Add(Connection:= _
      "URL;" & strUrl _
      , Destination:=Range("$A$1"))


      '.CommandType = 0


      .Name = strLoc
      .FieldNames = True
      .RowNumbers = False
      .FillAdjacentFormulas = False
      .PreserveFormatting = True
      .RefreshOnFileOpen = False
      .BackgroundQuery = True
      .RefreshStyle = xlInsertDeleteCells
      .SavePassword = False
      .SaveData = True
      .AdjustColumnWidth = True
      .RefreshPeriod = 0
      .WebSelectionType = xlEntirePage
      .WebFormatting = xlWebFormattingNone
      .WebPreFormattedTextToColumns = True
      .WebConsecutiveDelimitersAsOne = True
      .WebSingleBlockTextImport = False
      .WebDisableDateRecognition = False
      .WebDisableRedirections = False
      .Refresh BackgroundQuery:=False
    End With

    ActiveWindow.SmallScroll Down:=18
    Rows("1:31").Select
    Selection.Delete Shift:=xlUp
    Range("A5").Select
  Next x

End Sub
即使对于page=2,也需要session=clear,如下所示:

http://etablissements.hopital.fr/resultat_annuaire.php?loc=01&item=hopital&session=clear&page=2

对于page=2,您的宏运行得非常好

Sub sof20287920Data_scrapping()
  Dim x, strLoc, strUrl
  Dim wkb

  Set wkb = Workbooks.Add()
  wkb.Activate

  For x = 1 To 9
    ActiveWorkbook.Worksheets.Add After:=ActiveWorkbook.Sheets(ActiveWorkbook.Sheets.Count)
    strLoc = "resultat_annuaire.php?loc=" & "0" & x & "&item=hopital&session=clear&page=2"
    strUrl = "http://etablissements.hopital.fr/" & strLoc
    With ActiveSheet.QueryTables.Add(Connection:= _
      "URL;" & strUrl _
      , Destination:=Range("$A$1"))


      '.CommandType = 0


      .Name = strLoc
      .FieldNames = True
      .RowNumbers = False
      .FillAdjacentFormulas = False
      .PreserveFormatting = True
      .RefreshOnFileOpen = False
      .BackgroundQuery = True
      .RefreshStyle = xlInsertDeleteCells
      .SavePassword = False
      .SaveData = True
      .AdjustColumnWidth = True
      .RefreshPeriod = 0
      .WebSelectionType = xlEntirePage
      .WebFormatting = xlWebFormattingNone
      .WebPreFormattedTextToColumns = True
      .WebConsecutiveDelimitersAsOne = True
      .WebSingleBlockTextImport = False
      .WebDisableDateRecognition = False
      .WebDisableRedirections = False
      .Refresh BackgroundQuery:=False
    End With

    ActiveWindow.SmallScroll Down:=18
    Rows("1:31").Select
    Selection.Delete Shift:=xlUp
    Range("A5").Select
  Next x

End Sub
即使对于page=2,也需要session=clear,如下所示:

http://etablissements.hopital.fr/resultat_annuaire.php?loc=01&item=hopital&session=clear&page=2