Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ruby-on-rails/61.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/lua/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby on rails Nokogiri:无法屏幕抓取页面(淘宝网)_Ruby On Rails_Screen Scraping_Nokogiri - Fatal编程技术网

Ruby on rails Nokogiri:无法屏幕抓取页面(淘宝网)

Ruby on rails Nokogiri:无法屏幕抓取页面(淘宝网),ruby-on-rails,screen-scraping,nokogiri,Ruby On Rails,Screen Scraping,Nokogiri,我正在使用nokogiri从中国网站(淘宝网)获取图像: 我可以获得标题和doc.css(“img”)[0]['src'],但我无法获得img#J_ImgBooth。有什么问题?它被阻塞了吗?看看html源代码,img#J#u ImgBooth的数据src属性没有src <img id="J_ImgBooth" data-src="http://img03.taobaocdn.com/bao/uploaded/i3/18513032853503639/T1z1ojXdNhXXXXXXXX_

我正在使用nokogiri从中国网站(淘宝网)获取图像:


我可以获得标题和
doc.css(“img”)[0]['src']
,但我无法获得
img#J_ImgBooth
。有什么问题?它被阻塞了吗?

看看html源代码,img#J#u ImgBooth的数据src属性没有src

<img id="J_ImgBooth" data-src="http://img03.taobaocdn.com/bao/uploaded/i3/18513032853503639/T1z1ojXdNhXXXXXXXX_!!2-item_pic.png_310x310.jpg"  data-hasZoom="700" />
很好。

这对我很有用:

doc.at_css(“#J#u ImgBooth”)[“数据src”]

您可以检查属性名称是否为
data src

#(Element:0x3ffb5d3d9df0 {
  name = "img",
  attributes = [
    #(Attr:0x3ffb5d3d9b84 { name = "id", value = "J_ImgBooth" }),
    #(Attr:0x3ffb5d3d9b70 {
      name = "data-src",
      value = "http://img03.taobaocdn.com/bao/uploaded/i3/18513032853503639/T1z1ojXdNhXXXXXXXX_!!2-item_pic.png_310x310.jpg"
      }),
    #(Attr:0x3ffb5d3d9b5c { name = "data-haszoom", value = "700" })]
  })

这就是我看到的。有一个
src
attributed你从chrome的元素检查器中看到了吗?当页面加载时,它可能会被js修改,它可能不是Nokogiri看到的实际html代码。这是firebug提供的。那么我怎样才能得到原始的html呢?或者通过js获得以下内容?使用wget获得原始html文件;或者右键单击页面并选择“查看页面源”菜单项。
doc.css("img#J_ImgBooth")[0]['data-src']
#(Element:0x3ffb5d3d9df0 {
  name = "img",
  attributes = [
    #(Attr:0x3ffb5d3d9b84 { name = "id", value = "J_ImgBooth" }),
    #(Attr:0x3ffb5d3d9b70 {
      name = "data-src",
      value = "http://img03.taobaocdn.com/bao/uploaded/i3/18513032853503639/T1z1ojXdNhXXXXXXXX_!!2-item_pic.png_310x310.jpg"
      }),
    #(Attr:0x3ffb5d3d9b5c { name = "data-haszoom", value = "700" })]
  })