Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从网站下载文件_Python_Http_Download_Httprequest_Python Requests - Fatal编程技术网

Python 如何从网站下载文件

Python 如何从网站下载文件,python,http,download,httprequest,python-requests,Python,Http,Download,Httprequest,Python Requests,这是我的第一个问题,所以如果我在任何方面做错了,请对我好一点 我使用Python3.3中的requests模块来自动从一些站点下载文件,但当我试图获取csv文件时,尤其给我带来了麻烦。我有一个可行的python能力水平,但就网站交互而言,我不熟悉html和javascript 这是相关代码 import requests import datetime now = datetime.datetime.now().strftime("%Y%m%d") folder = 'some path'

这是我的第一个问题,所以如果我在任何方面做错了,请对我好一点

我使用Python3.3中的requests模块来自动从一些站点下载文件,但当我试图获取csv文件时,尤其给我带来了麻烦。我有一个可行的python能力水平,但就网站交互而言,我不熟悉html和javascript

这是相关代码

import requests
import datetime

now = datetime.datetime.now().strftime("%Y%m%d")

folder = 'some path'

url = 'https://gats.pjm-eis.com/gats2/PublicReports/RenewableGeneratorsRegisteredInGATS/'#ExportTo'
payload = {'exportType' : 'CSV',
           'tabNumber' : ''}
doc = requests.post(url, data=payload, stream=True)

output = open(folder+now+'_GATSRegistered.csv','wb')
output.write(doc.content)
output.close()
我没有收到任何错误,但我正在创建的文档基于错误页面。我已经为一个url直接指向文件的站点成功地完成了这项工作(
)http://www.place.com/path/file.xlsx
),因此我知道一旦检索到该文件,该如何处理它。但这只需要一个“获取”请求

所以,我的问题是:

  • 要发布的正确请求是什么
  • post是正确的做法吗
  • 这是一个特殊情况还是我应该知道如何处理的一般情况
  • 还有什么我应该改变的吗

我在Chrome中查看了页面,打开了开发者控制台,打开了网络选项卡。在那里,您可以看到单击“CSV”按钮发送带有大量表单数据的
POST
请求

exportType:CSV
tabNumber:
CSV_CH:1
PRN_CH:0
GridView$DXFREditorcol0:
GridView$DXFREditorcol1:
GridView$DXFREditorcol2:
GridView$DXFREditorcol3:
GridView$DXFREditorcol4:
GridView$DXFREditorcol5:
GridView$DXFREditorcol6:
GridView$DXFREditorcol7:
GridView$DXFREditorcol8:
GridView$DXFREditorcol9:
GridView$DXFREditorcol10:
GridView$DXFREditorcol11:
GridView$DXFREditorcol12:
GridView$DXFREditorcol13:
GridView$DXFREditorcol14:
GridView$DXFREditorcol15:
GridView$DXFREditorcol16:
GridView$DXFREditorcol17:
GridView$DXFREditorcol18:
GridView$DXFREditorcol19:
GridView$DXFREditorcol20:
GridView$DXFREditorcol21:
GridView$DXFREditorcol22:
GridView$DXFREditorcol23:
GridView$DXFREditorcol24:
GridView$DXFREditorcol25:
GridView$DXFREditorcol26:
GridView_custwindowWS:0:0:-1:-10000:-10000:0:1px:-10000:1:0:0:0
GridView_DXHFPWS:0:0:-1:-10000:-10000:0:180px:100px:1:0:0:0
GridView_DXPagerBottom_PSPSI:2
GridView$DXSelInput:
GridView$DXKVInput:[]
GridView$CallbackState:BwMHAQIFU3RhdGUGEAEHGwcAAgEHAQIBBwICAQcDAgEHBAIBBwUCAQcGAgEHBwIBBwgCAQcJAgEHCgIBBwsCAQcMAgEHDQIBBw4CAQcPAgEHEAIBBxECAQcSAgEHEwIBBxQCAQcVAgEHFgIBBxcCAQcYAgEHGQIBBxoCAQcABxsHAAcABwEHAAcCBwAHAwcABwQHAAcFBwAHBgcABwcHAAcIBwAHCQcABwoHAAcLBwAHDAcABw0HAAcOBwAHDwcABxAHAAcRBwAHEgcABxMHAAcUBwAHFQcABxYHAAcXBwAHGAcABxkHAAcaBwAHAAcAAgAFAAAAgAkCCUVudGl0eUtleQkCAAIAAwcEAgAHAAIBBTaVAAAHAAIBBwAHAAIQRmlsdGVyRXhwcmVzc2lvbgcCAAIIUGFnZVNpemUDBzI=
GridView$DXSyncInput:
GridView_DXFilterRowMenuCI:
DXScript:1_142,1_80,1_135,1_91,14_0,1_90,1_113,14_23,14_10,1_98,1_105,1_77,1_128,1_126,1_124,1_133,1_119,1_127,1_104,1_101,1_84,1_109,1_92,14_1,1_94,1_97,1_95,1_96,1_106,14_4,1_100,1_117,1_103,14_12,14_13,1_102,1_129,1_107,1_137,1_114,14_16,10_2,10_1,10_3,10_4,14_3
DXMVCEditorsValues:{"GridView_DXFREditorcol0":null,"GridView_DXFREditorcol1":null,"GridView_DXFREditorcol2":null,"GridView_DXFREditorcol3":null,"GridView_DXFREditorcol4":null,"GridView_DXFREditorcol5":null,"GridView_DXFREditorcol6":null,"GridView_DXFREditorcol7":null,"GridView_DXFREditorcol8":null,"GridView_DXFREditorcol9":null,"GridView_DXFREditorcol10":null,"GridView_DXFREditorcol11":null,"GridView_DXFREditorcol12":null,"GridView_DXFREditorcol13":null,"GridView_DXFREditorcol14":null,"GridView_DXFREditorcol15":null,"GridView_DXFREditorcol16":null,"GridView_DXFREditorcol17":null,"GridView_DXFREditorcol18":null,"GridView_DXFREditorcol19":null,"GridView_DXFREditorcol20":null,"GridView_DXFREditorcol21":null,"GridView_DXFREditorcol22":null,"GridView_DXFREditorcol23":null,"GridView_DXFREditorcol24":null,"GridView_DXFREditorcol25":null,"GridView_DXFREditorcol26":null}
您可以看到上面哪一项对于发送到服务器是绝对必要的。我怀疑所有这些都是必需的(但我错了很多:)

也就是说,当使用
stream=True
时,您应该使用
iter\u content
。因此,您的代码如下所示:

payload = {
# Form contents
}
r = requests.post(url, data=payload, stream=True)
with open(filename, 'wb') as output:
    for chunk in r.iter_content():
        output.write(chunk)

for循环确保在可用时将其写入文件。当它停止运行时,您不必担心它会挂在您身上。

Post用于向服务器发送数据,get用于获取数据(虽然它也可以发送数据,但不安全)。好的,一件事是查看Python的
,使用
关键字打开文件。它会自动关闭它们。@user2387370当我使用get时,我只会获取原始页面,就好像没有有效负载一样。我的想法是,我需要发送一些数据,让服务器知道我“点击”了CSV按钮,但你是说这可以通过get完成吗?@user41790 get可以用来发送数据,尽管我不知道请求的详细信息。Get通过将数据附加到url来发送数据,如果在最后看到带有&foo=7&bar=8等的url,这就是Get请求;这很有帮助。我也不认为所有的表单数据都是绝对必要的,但为了证明概念,我先尝试所有的表单数据。如何将Chrome控制台视图转换为python/请求?我真的不知道什么应该是数组、列表、dict、字符串等。看起来网站正在使用一个唯一的标识符来防止传递式欺诈,这有一个副作用,即在这个级别上防止帖子交互(我认为)。不过,将这个答案标记为正确。