Javascript 在Python';行不通
我正在努力清理这个网站: 我们的目标是迭代这两个下拉菜单,但首先,我试图在Python代码中只放一个组合,但这不起作用。我使用的是请求和beautifulSoup4Javascript 在Python';行不通,javascript,python,web-scraping,beautifulsoup,python-requests,Javascript,Python,Web Scraping,Beautifulsoup,Python Requests,我正在努力清理这个网站: 我们的目标是迭代这两个下拉菜单,但首先,我试图在Python代码中只放一个组合,但这不起作用。我使用的是请求和beautifulSoup4 from bs4 import BeautifulSoup import requests url = 'http://courier.correos.cl/Tarificador/aspx/Cep.aspx?s=1&lsrv=20&tipo=1' with requests.Session() as sess
from bs4 import BeautifulSoup
import requests
url = 'http://courier.correos.cl/Tarificador/aspx/Cep.aspx?s=1&lsrv=20&tipo=1'
with requests.Session() as session:
session.headers = {
'User-Agent': 'Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30',
'X-Requested-With': 'XMLHttpRequest'
}
response = session.get(url)
soup = BeautifulSoup(response.content)
# build an options mapping
OriginenOptions = {option.get_text(strip=True): option['value'] for option in soup.select("select#ctl00_ContentPlaceHolder1_ddlOrigenComuna option")[1:]}
DestinoOptions = {option.get_text(strip=True): option['value'] for option in soup.select("select#ctl00_ContentPlaceHolder1_ddlDestino option")[1:]}
form = soup.find("form", id="aspnetForm")
Origen='ALGARROBO'
Destino='ACHAO'
Peso=1000
Largo=10
Alto=10
Ancho=10
Tipo=1
params = {
'ctl00$ContentPlaceHolder1$ddlOrigenComuna': OriginenOptions.get(Origen),
'ctl00$ContentPlaceHolder1$ddlDestino': DestinoOptions.get(Destino),
'__ASYNCPOST': 'true',
'ctl00$ContentPlaceHolder1$ScriptManager1': 'tctl00$ContentPlaceHolder1$UpdatePanel1|tctl00$ContentPlaceHolder1$updTarifas',
'ctl00$ContentPlaceHolder1$txtPeso': Peso,
'ctl00$ContentPlaceHolder1$txtLargo': Largo,
'ctl00$ContentPlaceHolder1$txtAncho': Ancho,
'ctl00$ContentPlaceHolder1$txtAlto': Alto,
'ctl00$ContentPlaceHolder1$rbtlTipoEnvio': Tipo,
'__EVENTTARGET': 'ctl00$ContentPlaceHolder1$btnCotizar',
'__EVENTARGUMENT': form.find('input', {'name': '__EVENTARGUMENT'})['value'],
'__LASTFOCUS': '',
'__VIEWSTATE': form.find('input', {'name': '__VIEWSTATE'})['value'],
'__VIEWSTATEGENERATOR': form.find('input', {'name': '__VIEWSTATEGENERATOR'})['value'],
'__VIEWSTATEENCRYPTED': '',
'__EVENTVALIDATION': form.find('input', {'name': '__EVENTVALIDATION'})['value']
}
response = session.post(url, data=params)
# parse the results
soup = BeautifulSoup(response.content)
for row in soup.select("table#ctl00_ContentPlaceHolder1_GridView1 tr")[1:]:
print(row.find_all("td")[1].text)
您在
response.text
或response.content
中有什么内容?在第一次通话中(会话前.post),我有正常的网页。在那之后,我有另一个带有“微软错误”的页面。看来这些参数是错的。我不知道在这种情况下如何使用params。什么是“构建选项映射”?您是否尝试过将POST请求参数减少到只有\u VIEWSTATE
类型?构建选项映射,用于将字符串与dropbox的值映射,我在另一篇文章中看到了这段代码(我尝试复制到我的情况)。