Python 从Microsoft托管代理Azure Pipelines中的URL下载文件
我正在Azure YAML管道中运行Python脚本任务。通过浏览器访问URL时,会下载JSON文件。 URL- 到目前为止我所做的--> 一旦浏览器打开URL,文件应在本地自动下载。 但是,在运行上述管道时,我似乎无法在代理计算机上的任何位置找到该文件。我也没有得到任何错误 我正在使用Windows-2019 Microsoft托管代理 如何在代理计算机中找到下载的文件路径 或者有没有其他方法可以从URL下载文件而不必打开浏览器 如何在代理计算机中找到下载的文件路径 请尝试以下Python脚本:Python 从Microsoft托管代理Azure Pipelines中的URL下载文件,python,azure-pipelines,python-webbrowser,azure-pipelines-yaml,azure-pipelines-tasks,Python,Azure Pipelines,Python Webbrowser,Azure Pipelines Yaml,Azure Pipelines Tasks,我正在Azure YAML管道中运行Python脚本任务。通过浏览器访问URL时,会下载JSON文件。 URL- 到目前为止我所做的--> 一旦浏览器打开URL,文件应在本地自动下载。 但是,在运行上述管道时,我似乎无法在代理计算机上的任何位置找到该文件。我也没有得到任何错误 我正在使用Windows-2019 Microsoft托管代理 如何在代理计算机中找到下载的文件路径 或者有没有其他方法可以从URL下载文件而不必打开浏览器 如何在代理计算机中找到下载的文件路径 请尝试以下Python脚本
steps:
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
import urllib.request
url = 'https://www.some_url.com/downloads'
path = r"$(Build.ArtifactStagingDirectory)/filename.xx"
urllib.request.urlretrieve(url, path)
或
然后,该文件将以Python脚本下载到目标路径
这里有一个关于
更新:
steps:
- script: |
pip install bs4
pip install lxml
workingDirectory: '$(build.sourcesdirectory)'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import urllib.request
req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(req).read()
a=""
soup = BeautifulSoup(html_page, "lxml")
for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
a= link.get('href')
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
url:microsoft.com/en-us/download/confirmation.aspx?id=56519
需要打开网页,文件将自动下载
因此,当您使用wget或urllib.request时,您将得到403错误
您可以更改为使用站点url手动下载json文件
例如:url:https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json
import urllib.request
url = 'https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json'
path = r"$(Build.ArtifactStagingDirectory)\agent.json"
urllib.request.urlretrieve(url, path)
更新2:
steps:
- script: |
pip install bs4
pip install lxml
workingDirectory: '$(build.sourcesdirectory)'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import urllib.request
req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(req).read()
a=""
soup = BeautifulSoup(html_page, "lxml")
for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
a= link.get('href')
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
您可以使用Python脚本在网站上下载
示例:
steps:
- script: |
pip install bs4
pip install lxml
workingDirectory: '$(build.sourcesdirectory)'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import urllib.request
req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(req).read()
a=""
soup = BeautifulSoup(html_page, "lxml")
for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
a= link.get('href')
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
结果:
steps:
- script: |
pip install bs4
pip install lxml
workingDirectory: '$(build.sourcesdirectory)'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import urllib.request
req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(req).read()
a=""
soup = BeautifulSoup(html_page, "lxml")
for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
a= link.get('href')
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
更新3:
steps:
- script: |
pip install bs4
pip install lxml
workingDirectory: '$(build.sourcesdirectory)'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re
import urllib.request
req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(req).read()
a=""
soup = BeautifulSoup(html_page, "lxml")
for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
a= link.get('href')
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
获取下载URL的另一种方法:
steps:
- script: 'pip install requests'
displayName: 'Command Line Script'
- task: PythonScript@0
displayName: 'Run a Python script'
inputs:
scriptSource: inline
script: |
import requests
import re
import urllib.request
rq= requests.get("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519")
t = re.search("https://download.microsoft.com/download/.*?\.json", rq.text )
a= t.group()
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
你能简单地解释一下你在update3中做了什么吗?在第三次更新中,直接通过请求获取页面源代码(html),然后使用正则表达式直接获取匹配的url。这可能是一个更简单的方法。那这行呢-->“a=t.group()”。这是做什么的?它用于返回匹配的对象。并将值传递给