Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/asp.net/29.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-从asp表单下载文件_Python_Asp.net_Web Scraping - Fatal编程技术网

Python-从asp表单下载文件

Python-从asp表单下载文件,python,asp.net,web-scraping,Python,Asp.net,Web Scraping,我对Python脚本完全陌生。我一直试图想出一些代码,可以帮助我,但我没有成功。问题是我必须从这个链接下载文件,当我选择日期和时间,然后点击下载按钮,url变为,所需的文件显示。我想写一个pythonscrpit,它可以迭代日期和时间,并可以将结果网页中的数据下载到文本文件中。如果有人能帮助我,我将非常感激 更新 这是我迄今为止所做的,但我不能再进一步了。有什么帮助吗? 目前它给了我这个错误 “IOError:[Errno 22]无效模式('w')或文件名:'*5/9/2016*0000.txt

我对Python脚本完全陌生。我一直试图想出一些代码,可以帮助我,但我没有成功。问题是我必须从这个链接下载文件,当我选择日期和时间,然后点击下载按钮,url变为,所需的文件显示。我想写一个pythonscrpit,它可以迭代日期和时间,并可以将结果网页中的数据下载到文本文件中。如果有人能帮助我,我将非常感激

更新 这是我迄今为止所做的,但我不能再进一步了。有什么帮助吗? 目前它给了我这个错误 “IOError:[Errno 22]无效模式('w')或文件名:'*5/9/2016*0000.txt'”

更新2 我对代码做了一些改进,但仍然不能正常工作。它一次又一次地下载同一个文件。有人能帮忙吗

import datetime
#adil ="5-9-2016"
#dt = datetime.datetime.strptime(adil, '%m-%d-%Y')
#print '{0}{1}{2:02}'.format(dt.year, dt.month, dt.day % 100)

from mechanize import Browser
br = Browser()
br.set_handle_robots(False)
br.open("http://www.pmd.gov.pk/cp/display.asp")
br.select_form(nr=0)
form = br.form #
controlDate = br.form.find_control("dat")
controlTime = br.form.find_control("Tim")

for Date in controlDate.items:
    dtt = str(Date)
    try:
        dt = datetime.datetime.strptime(dtt, '%m/%d/%Y')
    except:
        pass
    try:
        dt = datetime.datetime.strptime(dtt, '*%m/%d/%Y')
    except:
        pass
    #if Date.name == str(dt.strftime('%m'))+'/'+str(dt.strftime('%d'))+'/'+str(dt.strftime('%Y')):
    if Date.name == str(Date):
        Date.selected = True
        for Time in controlTime.items:
            tt = str(Time)
            try:
                ttt = datetime.datetime.strptime(tt, '%H%M')
            except:
                pass
            try:
                ttt = datetime.datetime.strptime(tt, '*%H%M')
            except:
                pass
            #if Time.name == str(Time):
            #Time.selected = Time.name
            #form['Tim'] = str(Time)
            if Time.name == str(ttt.strftime('%H'))+str(ttt.strftime('%M')):
                Time.selected = True
                synoptic = (br.submit()).read()
                #timeName= "%s%s" %(ttt.hour, ttt.minute)
                textFile = open(str(dt.strftime('%Y'))+str(dt.strftime('%m'))+str(dt.strftime('%d'))+str(ttt.strftime('%H'))+str(ttt.strftime('%M'))+"syn.txt", 'w')
                textFile.write(synoptic)
                textFile.close()
            else:
                break
print "***FINISHED***"
更新2[已解决] 这是我自己问题的解决办法。我想在从表单中选择日期和时间后下载所需的文件。这是我的密码:

# -*- coding: utf-8 -*-
"""
Created on Tue May 10 14:15:41 2016

@author: MuhammadAdilJaved
"""
import re
import time
from mechanize import Browser
br = Browser()
br.set_handle_robots(False)

br.open("http://www.pmd.gov.pk/cp/display.asp")
br.select_form(nr=0)
form = br.form #
controlDate = br.form.find_control("dat")
controlTime = br.form.find_control("Tim")
backdays = 1            #for how many days in back you want to download data, max. 15
start_time = time.time()
dateList = []
dateListPMD = []

timeList = ['0000', '0300', '0600', '0900', '1200', '1500', '1800', '2100']

#Visit "http://www.pmd.gov.pk/cp/display.asp" and put the available dates in List below
#dateList = ['5/10/2016','5/9/2016',\
#           '5/8/2016','5/7/2016',\
#           '5/6/2016','5/5/2016',\
#           '5/4/2016','5/3/2016',\
#           '5/2/2016','5/1/2016',\
#           '4/30/2016','4/29/2016',\
#           '4/28/2016','4/27/2016',\
#           '4/26/2016']

for item in controlDate.items:
        dateListPMD.append(item.name)

for date in dateListPMD[0:int(backdays)]:
    dateList.append(date)

print "**** Downloading data for following dates: ****"
print dateList
#short dateList for debugging
#dateList = ['5/10/2016','5/9/2016']
#dateList = ['5/10/2016']

i = 0   #for DATE Loop
ii = 0  #for TIME Loop
count = 0 #to calculate no of files Downloaded
for dt in dateList:
    br.open("http://www.pmd.gov.pk/cp/display.asp")
    br.select_form(nr=0)
    form = br.form #
    controlDate = br.form.find_control("dat")
    controlTime = br.form.find_control("Tim")
    for item in controlDate.items:
        if item.name == dt:
            item.selected = True
            dtFileName = dt.translate(None, '!@#$/')
            print "Required DATE found i.e "
            print item.name, dt
            print "DATE loop # ", i
            i = i+1
            for dti in timeList:
                br.open("http://www.pmd.gov.pk/cp/display.asp")
                br.select_form(nr=0)
                form = br.form #
                controlDate = br.form.find_control("dat")
                controlTime = br.form.find_control("Tim")

                for item in controlDate.items:  #
                    if item.name == dt:         #
                        for item2 in controlTime.items:
                            if item2.name == dti:
                                print "Require TIME found & Downloading File i.e "
                                print item2.name, dti
                                print "TIME loop # ", ii
                                ii = ii + 1
                                item.selected = True                        
                                item2.selected = True
                                synoptic = (br.submit()).read()
                                soup = re.sub('<[^>]*>', '', synoptic)            
                                textFile = open(str(dtFileName)+str(dti)+'.txt', 'wb')
                                textFile.write(soup)
                                textFile.close()
                                count = count + 1
                            else:
                                #print "Required TIME NOT found"
                                print item2.name, dti
        else:
            #print "Required DATE NOT found"
            print item.name, dt


elapsed_time = (time.time() - start_time)/60
print "*****************************"
print "Total Elapsed Time: ", round(elapsed_time,2), " Mins."
print "Total Files Downloaded: ", str(count)
print "*****  F I N I S H E D  *****"
#-*-编码:utf-8-*-
"""
创建于2016年5月10日星期二14:15:41
@作者:穆罕默德·迪尔哈维德
"""
进口稀土
导入时间
从mechanize导入浏览器
br=浏览器()
br.设置手柄机器人(错误)
br.打开(“http://www.pmd.gov.pk/cp/display.asp")
br.选择表格(nr=0)
form=br.form#
controlDate=br.form.find_控件(“dat”)
controlTime=br.form.find\u控件(“Tim”)
backdays=1#对于要在后台下载数据的天数,最多15天
开始时间=time.time()
日期列表=[]
dateListPMD=[]
时间列表=['0000','0300','0600','0900','1200','1500','1800','2100']
#访问“http://www.pmd.gov.pk/cp/display.asp“并在下面的列表中列出可用的日期
#日期列表=['5/10/2016','5/9/2016'\
#           '5/8/2016','5/7/2016',\
#           '5/6/2016','5/5/2016',\
#           '5/4/2016','5/3/2016',\
#           '5/2/2016','5/1/2016',\
#           '4/30/2016','4/29/2016',\
#           '4/28/2016','4/27/2016',\
#           '4/26/2016']
对于controlDate.items中的项目:
dateListPMD.append(item.name)
对于dateListPMD[0:int(backdays)]中的日期:
dateList.append(日期)
打印“****下载以下日期的数据:***”
打印日期表
#用于调试的短日期列表
#日期列表=['5/10/2016','5/9/2016']
#日期列表=['5/10/2016']
i=0#表示日期循环
ii=0#表示时间循环
count=0#计算下载的文件数
对于日期列表中的dt:
br.打开(“http://www.pmd.gov.pk/cp/display.asp")
br.选择表格(nr=0)
form=br.form#
controlDate=br.form.find_控件(“dat”)
controlTime=br.form.find\u控件(“Tim”)
对于controlDate.items中的项目:
如果item.name==dt:
item.selected=True
dtFileName=dt.translate(无,!@$/)
打印“找到所需日期,即”
打印item.name,dt
打印“日期循环”,i
i=i+1
对于时间列表中的dti:
br.打开(“http://www.pmd.gov.pk/cp/display.asp")
br.选择表格(nr=0)
form=br.form#
controlDate=br.form.find_控件(“dat”)
controlTime=br.form.find\u控件(“Tim”)
对于controlDate.items中的项目:#
如果item.name==dt:#
对于controlTime.items中的item2:
如果item2.name==dti:
打印“查找和下载文件所需时间,即”
打印item2.name,dti
打印“时间循环”,ii
ii=ii+1
item.selected=True
item2.selected=True
天气=(br.submit()).read()
soup=re.sub(']*>','',天气学)
textFile=open(str(dtFileName)+str(dti)+'.txt',wb')
textFile.write(soup)
textFile.close()
计数=计数+1
其他:
#打印“未找到所需时间”
打印item2.name,dti
其他:
#打印“未找到所需日期”
打印item.name,dt
已用时间=(time.time()-开始时间)/60
打印“*****************************”
打印“总运行时间:”,四舍五入(运行时间,2),“分钟”
打印“下载的文件总数:”,str(计数)
打印“******F I I S H E D*****”

那么,您的第一个编程问题就是如何在已知url的情况下下载网页。这里是你可以开始的地方这不是唯一的方法,我使用Python请求,所以有很多选择-找到一个开始编码的例子,当你遇到问题时,带一些代码回到这里,问一个特定的编码问题。谢谢你的及时回复。你能看看上面的代码,看看为什么它不能按我想要的方式工作吗。谢谢:)那么你的第一个编程问题是,在给定已知url的情况下,如何下载网页。这里是你可以开始的地方这不是唯一的方法,我使用Python请求,所以有很多选择-找到一个开始编码的例子,当你遇到问题时,带一些代码回到这里,问一个特定的编码问题。谢谢你的及时回复。你能看看上面的代码,看看为什么它不能按我想要的方式工作吗。谢谢:)
# -*- coding: utf-8 -*-
"""
Created on Tue May 10 14:15:41 2016

@author: MuhammadAdilJaved
"""
import re
import time
from mechanize import Browser
br = Browser()
br.set_handle_robots(False)

br.open("http://www.pmd.gov.pk/cp/display.asp")
br.select_form(nr=0)
form = br.form #
controlDate = br.form.find_control("dat")
controlTime = br.form.find_control("Tim")
backdays = 1            #for how many days in back you want to download data, max. 15
start_time = time.time()
dateList = []
dateListPMD = []

timeList = ['0000', '0300', '0600', '0900', '1200', '1500', '1800', '2100']

#Visit "http://www.pmd.gov.pk/cp/display.asp" and put the available dates in List below
#dateList = ['5/10/2016','5/9/2016',\
#           '5/8/2016','5/7/2016',\
#           '5/6/2016','5/5/2016',\
#           '5/4/2016','5/3/2016',\
#           '5/2/2016','5/1/2016',\
#           '4/30/2016','4/29/2016',\
#           '4/28/2016','4/27/2016',\
#           '4/26/2016']

for item in controlDate.items:
        dateListPMD.append(item.name)

for date in dateListPMD[0:int(backdays)]:
    dateList.append(date)

print "**** Downloading data for following dates: ****"
print dateList
#short dateList for debugging
#dateList = ['5/10/2016','5/9/2016']
#dateList = ['5/10/2016']

i = 0   #for DATE Loop
ii = 0  #for TIME Loop
count = 0 #to calculate no of files Downloaded
for dt in dateList:
    br.open("http://www.pmd.gov.pk/cp/display.asp")
    br.select_form(nr=0)
    form = br.form #
    controlDate = br.form.find_control("dat")
    controlTime = br.form.find_control("Tim")
    for item in controlDate.items:
        if item.name == dt:
            item.selected = True
            dtFileName = dt.translate(None, '!@#$/')
            print "Required DATE found i.e "
            print item.name, dt
            print "DATE loop # ", i
            i = i+1
            for dti in timeList:
                br.open("http://www.pmd.gov.pk/cp/display.asp")
                br.select_form(nr=0)
                form = br.form #
                controlDate = br.form.find_control("dat")
                controlTime = br.form.find_control("Tim")

                for item in controlDate.items:  #
                    if item.name == dt:         #
                        for item2 in controlTime.items:
                            if item2.name == dti:
                                print "Require TIME found & Downloading File i.e "
                                print item2.name, dti
                                print "TIME loop # ", ii
                                ii = ii + 1
                                item.selected = True                        
                                item2.selected = True
                                synoptic = (br.submit()).read()
                                soup = re.sub('<[^>]*>', '', synoptic)            
                                textFile = open(str(dtFileName)+str(dti)+'.txt', 'wb')
                                textFile.write(soup)
                                textFile.close()
                                count = count + 1
                            else:
                                #print "Required TIME NOT found"
                                print item2.name, dti
        else:
            #print "Required DATE NOT found"
            print item.name, dt


elapsed_time = (time.time() - start_time)/60
print "*****************************"
print "Total Elapsed Time: ", round(elapsed_time,2), " Mins."
print "Total Files Downloaded: ", str(count)
print "*****  F I N I S H E D  *****"