如何使用Nokogiri和Ruby解析JavaScript_Javascript_Ruby_Nokogiri

如何使用Nokogiri和Ruby解析JavaScript

javascript ruby

如何使用Nokogiri和Ruby解析JavaScript,javascript,ruby,nokogiri,Javascript,Ruby,Nokogiri,我需要解析一个网站的数组。我要分析的JavaScript部分如下所示： _arPic[0] = "http://example.org/image1.jpg"; _arPic[1] = "http://example.org/image2.jpg"; _arPic[2] = "http://example.org/image3.jpg"; _arPic[3] = "http://example.org/image4.jpg"; _arPic[4] = "http://example.org/im

我需要解析一个网站的数组。我要分析的JavaScript部分如下所示：

_arPic[0] = "http://example.org/image1.jpg";
_arPic[1] = "http://example.org/image2.jpg";
_arPic[2] = "http://example.org/image3.jpg";
_arPic[3] = "http://example.org/image4.jpg";
_arPic[4] = "http://example.org/image5.jpg";
_arPic[5] = "http://example.org/image6.jpg";

product_page = Nokogiri::HTML(open(full_url))    
product_page.css("div#main_column script")[0]

我使用如下方式获得整个JavaScript：

_arPic[0] = "http://example.org/image1.jpg";
_arPic[1] = "http://example.org/image2.jpg";
_arPic[2] = "http://example.org/image3.jpg";
_arPic[3] = "http://example.org/image4.jpg";
_arPic[4] = "http://example.org/image5.jpg";
_arPic[5] = "http://example.org/image6.jpg";

product_page = Nokogiri::HTML(open(full_url))    
product_page.css("div#main_column script")[0]

有没有一种简单的方法来解析所有变量？

如果我读对了你的话，你正在尝试解析JavaScript并获得一个带有图像URL的Ruby数组是吗

Nokogiri只解析HTML/XML，所以您需要一个不同的库；粗略搜索会找到库，它有一个

parse

函数，该函数接受一个JavaScript字符串并返回一个解析树

一旦你有了一个解析树，你就需要遍历它并按名称找到感兴趣的节点（例如，

\u arPic

），然后在赋值的另一端获取字符串内容

或者，如果不必太健壮（也不会太健壮），您可以使用正则表达式搜索JavaScript（如果可能）：

/^\s*_arPic\[\d\] = "(.+)";$/

可能是一个很好的启动正则表达式。

简单的方法：

_arPic = URI.extract product_page.css("div#main_column script")[0].text

可缩短为：

_arPic = URI.extract product_page.at("div#main_column script").text