Ruby Net::HTTP::Post提交表单
我正在努力创建一个网站刮板。有一个表单用于更改当前页面 这是我为POST请求提交表单的方式,但它似乎一次又一次地获取相同的页面 以下是一些示例代码:Ruby Net::HTTP::Post提交表单,ruby,forms,post,net-http,Ruby,Forms,Post,Net Http,我正在努力创建一个网站刮板。有一个表单用于更改当前页面 这是我为POST请求提交表单的方式,但它似乎一次又一次地获取相同的页面 以下是一些示例代码: pages = { "total_pages" => 19, "p1" => '1234/1456/78990/123324345/12143343214345/231432143/12432412/435435/', "p2" => '1432424/123421421/345/435/6/65/5/34/3/2/21/1
pages = {
"total_pages" => 19,
"p1" => '1234/1456/78990/123324345/12143343214345/231432143/12432412/435435/',
"p2" => '1432424/123421421/345/435/6/65/5/34/3/2/21/1243',
..
..
..
}
idx = 1
p_count = pages["total_pages"]
#set up the HTTP request to change pages to get all the auction results
uri = URI.parse("http://somerandomwebsite.com?listings")
http = Net::HTTP.new(uri.host, uri.port)
req = Net::HTTP::Post.new(uri.request_uri)
p_count.times do
puts "On loop sequence: #{idx}"
pg_num = "p#{idx}"
pg_content = pages["#{pg_num}"]
req.set_form_data({"page" => "#{pg_num}", "#{pg_num}" => "#{pg_content}"})
response = http.request(req)
page = Nokogiri::HTML(response.body)
idx = idx + 1
end
它看起来像是
页面
永远不会改变。是否有一种方法可以在每次我希望确保通过正确的参数时查看完整请求的外观?似乎几乎不可能确定有关req
的任何内容调试HTTP的一个好方法是利用:
返回:
# >> {
# >> "args": {},
# >> "data": "",
# >> "files": {},
# >> "form": {
# >> "max": "50",
# >> "q": "ruby"
# >> },
# >> "headers": {
# >> "Accept": "*/*",
# >> "Accept-Encoding": "gzip;q=1.0,deflate;q=0.6,identity;q=0.3",
# >> "Content-Length": "13",
# >> "Content-Type": "application/x-www-form-urlencoded",
# >> "Host": "httpbin.org",
# >> "User-Agent": "Ruby"
# >> },
# >> "json": null,
# >> "origin": "216.69.191.1",
# >> "url": "http://httpbin.org/post"
# >> }
也就是说,我建议不要使用Net::HTTP。Ruby有很多很棒的HTTP客户机,它们可以使编写代码变得更容易。例如,这里使用的是相同的东西:
这是未经测试的代码,因为您没有告诉我们足够的信息,但这是我开始做您正在做的事情的地方:
require 'httpclient'
BASE_URL = 'http://somerandomwebsite.com?listings'
PAGES = [
'1234/1456/78990/123324345/12143343214345/231432143/12432412/435435/',
'1432424/123421421/345/435/6/65/5/34/3/2/21/1243',
]
clnt = HTTPClient.new
PAGES.each.with_index(1) do |page, idx|
puts "On loop sequence: #{idx}"
response = clnt.post(BASE_URL, 'page' => idx, idx => page)
doc = Nokogiri::HTML(response.body)
# ...
end
请读“。您的代码将无法运行,我们必须将其更改为test以识别问题。那浪费了我们的时间。我建议不要使用Net::HTTP,而是使用Ruby现有的众多HTTP客户机之一。如果你正在发明一种新的服务器类型,HTTP是很棒的,但是对于正常的HTTP工作来说,它是非常低级的,特别是当你只是请求页面的时候。就查看请求而言,可能非常有用。
require 'httpclient'
clnt = HTTPClient.new
res = clnt.post('http://httpbin.org/post', 'q' => 'ruby', 'max' => '50')
puts res.body
# >> {
# >> "args": {},
# >> "data": "",
# >> "files": {},
# >> "form": {
# >> "max": "50",
# >> "q": "ruby"
# >> },
# >> "headers": {
# >> "Accept": "*/*",
# >> "Content-Length": "13",
# >> "Content-Type": "application/x-www-form-urlencoded",
# >> "Date": "Thu, 09 Feb 2017 20:03:57 GMT",
# >> "Host": "httpbin.org",
# >> "User-Agent": "HTTPClient/1.0 (2.8.3, ruby 2.4.0 (2016-12-24))"
# >> },
# >> "json": null,
# >> "origin": "216.69.191.1",
# >> "url": "http://httpbin.org/post"
# >> }
require 'httpclient'
BASE_URL = 'http://somerandomwebsite.com?listings'
PAGES = [
'1234/1456/78990/123324345/12143343214345/231432143/12432412/435435/',
'1432424/123421421/345/435/6/65/5/34/3/2/21/1243',
]
clnt = HTTPClient.new
PAGES.each.with_index(1) do |page, idx|
puts "On loop sequence: #{idx}"
response = clnt.post(BASE_URL, 'page' => idx, idx => page)
doc = Nokogiri::HTML(response.body)
# ...
end