PYTHON:在APSX中提交查询,并从aspx页面中删除结果
我希望“”中的人员提供废料信息,我正在编写以下代码:PYTHON:在APSX中提交查询,并从aspx页面中删除结果,python,asp.net,http,Python,Asp.net,Http,我希望“”中的人员提供废料信息,我正在编写以下代码: import urllib from bs4 import BeautifulSoup headers = { 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Origin': 'http://www.ratsit.se', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/
import urllib
from bs4 import BeautifulSoup
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Origin': 'http://www.ratsit.se',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'http://www.ratsit.se/',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
class MyOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
myopener = MyOpener()
url = 'http://www.ratsit.se/BC/SearchPerson.aspx'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.select("#__VIEWSTATE")[0]['value']
#eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']
formData = (
('__LASTFOCUS',''),
('__EVENTTARGET',''),
('__EVENTARGUMENT',''),
#('__EVENTVALIDATION', eventvalidation),
('__VIEWSTATE', viewstate),
('ctl00$cphMain$txtFirstName', 'name'),
('ctl00$cphMain$txtLastName', ''),
('ctl00$cphMain$txtBirthDate', ''), # etc. (not all listed)
('ctl00$cphMain$txtAddress', ''),
('ctl00$cphMain$txtZipCode', ''),
('ctl00$cphMain$txtCity', ''),
('ctl00$cphMain$txtKommun',''),
#('btnSearchAjax','Sök'),
)
encodedFields = urllib.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
try:
# actually we'd better use BeautifulSoup once again to
# retrieve results(instead of writing out the whole HTML file)
# Besides, since the result is split into multipages,
# we need send more HTTP requests
fout = open('tmp.html', 'w')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
我从服务器得到的响应是“我的ip被阻塞”,但这不是真的,因为当我使用浏览器时,它正在工作。。。有没有人能告诉我哪里出了问题
谢谢您的代码不起作用
File "/Users/florianoswald/git/webscraper/scrape2.py", line 16
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
^
IndentationError: expected an indented block
这应该是一个类定义吗?为什么我们需要MyOpener
类呢?这同样有效:
myopener = urllib.FancyURLopener()
my.open("http://www.google.com")
<addinfourl at 4411860752 whose fp = <socket._fileobject object at 0x106ed1c50>>
myopener=urllib.FancyURLopener()
我的"开"(")http://www.google.com")
响应消息一字不差地说,“我的ip被阻塞了”?为什么人们不能发布正确的错误消息?Ratsit上的搜索数量现在限制在每小时、每天、每周和每月。从该IP地址进行的搜索已超过这些限制,要继续搜索,需要与Ratsit达成用户协议。但是,我所说的不是真的,因为如果我使用浏览器,则显示没有问题。