使用ruby mechanize抓取数据
我正在从中抓取数据 以下是我尝试过的代码:使用ruby mechanize抓取数据,ruby,nokogiri,mechanize-ruby,Ruby,Nokogiri,Mechanize Ruby,我正在从中抓取数据 以下是我尝试过的代码: uri = "http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53" #html, html_content = @mobj.get_data(uri) agent = Mechanize.new html_page = agent.get uri html_form = html_pag
uri = "http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53"
#html, html_content = @mobj.get_data(uri)
agent = Mechanize.new
html_page = agent.get uri
html_form = html_page.form
html_form.radiobuttons_with(:name => 'search',:value => '2')[0].check
html_form.submit
puts html_page.content
错误:
var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:308:in `fetch': 500 => Net::HTTPInternalServerError for http://www.mca.gov.in/DCAPortalWeb/dca/ProsecutionDetailsSRAction.do -- unhandled response (Mechanize::ResponseCodeError)
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in `post_form'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:548:in `submit'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:223:in `submit'
from ministry_corp_aff.rb:32:in `start'
from ministry_corp_aff.rb:52:in `<main>'
var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:308:in'fetch':500=>Net::HTTPInternalServerError forhttp://www.mca.gov.in/DCAPortalWeb/dca/ProsecutionDetailsSRAction.do --未处理的响应(Mechanize::ResponseCodeError)
from/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in'post_form'
from/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:548:in“提交”
from/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:223:在“提交”中
来自部属公司办公楼rb:32:in'start'
来自中国农业部股份有限公司rb:52:in`'
如果我手动单击第三个单选按钮,然后提交它,我会得到一个.zip文件。我试图从该zip文件中提取.xls文件中的数据。单选按钮有一个onclick-even处理程序,可以触发某些javascript的执行。此外,单击Submit
标记也会执行一些javascript。该javascript可能会设置表单返回的一些值,服务器会检查这些值
Mechanize无法执行javascript。您需要selenium webdriver来实现这一点。当我们单击链接时,.zip文件被下载,其中包含.xls文件