Powershell 从html元素中提取http链接

Powershell 从html元素中提取http链接,powershell,web-scraping,Powershell,Web Scraping,赛门铁克最近更改了他们的下载页面,并将其移至broadcom。此后,Invoke WebRequest无法获取v5i64.exe文件的http url 但是,在浏览器中使用开发人员工具查看页面主体部分内的元素级别时,可以找到http url 有人知道如何使用PowerShell提取每天更改的url吗 $webreq = Invoke-WebRequest "https://www.broadcom.com/support/security-center/definitions/download/

赛门铁克最近更改了他们的下载页面,并将其移至broadcom。此后,
Invoke WebRequest
无法获取
v5i64.exe
文件的http url

但是,在浏览器中使用开发人员工具查看页面主体部分内的元素级别时,可以找到http url

有人知道如何使用PowerShell提取每天更改的url吗

$webreq = Invoke-WebRequest "https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep"
$webreq.Links | Select href
通过ComObject使用IE

$ie = new-object -ComObject "InternetExplorer.Application"
$ie.visible=$True
while($ie.Busy) { Start-Sleep -Milliseconds 100 }
$IE.navigate2("https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep")
while ($IE.busy) {
     start-sleep -milliseconds 1000 #wait 1 second interval to load page 
      } 

然后通过
$ie.Document.IHTMLDocument3\u getElementsByTagName(“元素名称”)
查找元素。以下PowerShell脚本将提示您下载包含文本
v5i64.exe
HTTPS
的链接。这适用于PowerShell 5.1 for Windows。它不适用于PowerShell 6或7(PowerShell核心)

在Windows 10.0.18363.657、Internet Explorer 11.657.18362、PowerShell 5.1.18362.628上测试

$url = "https://www.broadcom.com/support/security-center/definitions/download/detail?gid=sep"

$outfile = "./v5i64.exe"

$ie = New-Object -ComObject "InternetExplorer.Application"

$ie.visible=$True

while($ie.Busy) {
    Start-Sleep -Milliseconds 100
}

$ie.navigate2($url)

while($ie.ReadyState -ne 4 -or $ie.Busy) {
    Start-Sleep -milliseconds 500
} 

$ie.Document.getElementsByTagName("a") | % {
    if ($_.ie8_href -like "*v5i64.exe") {
        if ($_.ie8_href -like "https://*") {
            $len = (Invoke-WebRequest $_.ie8_href -Method Head).Headers.'Content-Length'
            Write-Host "File:" $_.ie8_href
            Write-Host "Size:" $len
            $confirm = Read-Host "Download file? [y/n]"
            if ($confirm -eq "y") {
                Write-Host "Downloading" $_.ie8_href

                Invoke-WebRequest -Uri $_.ie8_href -OutFile $outfile
            }
        }
    }
}

$ie.Stop()
$ie.Quit()

感谢您提出的解决方案。但是,以下是我使用的最终代码:

$SEP_last_link = ("http://definitions.symantec.com/defs/"+($SEP_last | Select-String release -NotMatch | select -Last 1))
    $Symantec_folder = "C:\Download for DVD\Symantec"
    $Symantec_filepath = "$Symantec_folder\$SEP_last"

    if (!(Test-Path "$Symantec_filepath" -PathType Leaf)) {
    Write-Host "`rStart to download Symantec $SEP_last file: $(Get-Date)`r"
    $start_time = (Get-Date)
    $webclient = New-Object System.Net.WebClient
    $WebClient.DownloadFile($SEP_last_link, $Symantec_filepath)
    Write-Host "`r$SEP_last file has been downloaded successfully`r" -ForegroundColor Green
    $end_time = $(get-date) - $start_time
    $total_time = "{0:HH:mm:ss}" -f ([datetime]$end_time.Ticks)
    Write-Host "`rTime to download Symantec $SEP_last file: $total_time`r"
} else {
  Write-Host "`rSymantec $SEP_last file already exists!`r" -ForegroundColor Yellow
}

    Get-ChildItem -Path "$Symantec_Folder\*-v5i64.exe" -Exclude "$SEP_last" -Verbose –Force | Remove-Item

原因是该链接不是您正在下载的页面的一部分。赛门铁克正在从下载中构建页面,在初始页面加载后将进行后续处理。感谢John的及时反馈。在这种情况下,是否可以在temp变量中下载/转储页面,模拟页面的人工加载?然后提取链接?