Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/440.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在CasperJS中使用XPath将元素捕获为Javascript数组_Javascript_Xpath_Casperjs_Getelementsbytagname - Fatal编程技术网

在CasperJS中使用XPath将元素捕获为Javascript数组

在CasperJS中使用XPath将元素捕获为Javascript数组,javascript,xpath,casperjs,getelementsbytagname,Javascript,Xpath,Casperjs,Getelementsbytagname,我一直在与CasperJS合作一个我正在做的网页抓取项目,但在让它完美工作方面遇到了一些困难 getElementsAttribute作为从表中捕获href和title信息的一种方式,一直运行得很好,但在某些情况下,表不是超链接的,但无论如何都需要被删除。以下是代码的开头部分: // Load utilities var utils = require('utils'); var client = require('clientutils'); var fs = require('fs');

我一直在与CasperJS合作一个我正在做的网页抓取项目,但在让它完美工作方面遇到了一些困难

getElementsAttribute
作为从表中捕获
href
title
信息的一种方式,一直运行得很好,但在某些情况下,表不是超链接的,但无论如何都需要被删除。以下是代码的开头部分:

// Load utilities

var utils = require('utils');
var client = require('clientutils');
var fs = require('fs');
var x = require('casper').selectXPath;
var casper = require('casper').create({

pageSettings: {
    loadImages:  false,        
    loadPlugins: false 
},

clientScripts:  ['C:/casperjs/lib/jquery.min.js','C:/casperjs/lib/jquery.csv-0.71.min.js']

});

// Choose Main URL and Target Links

var mainURL = "http://en.wikipedia.org/wiki/Identification_badges_of_the_United_States_military";
var mainAttribute = '//*[@id="mw-content-text"]/ul/li/div/div/p/a';
var mainElement = '//*[@id="mw-content-text"]/ul/li/div/div/p';

casper.start();

casper.open(mainURL).then(function(){

// Choose Links from Main URL

mainLinks = this.getElementsAttribute(x(mainAttribute),'href');
mainTitle = this.getElementsAttribute(x(mainAttribute),'title');
mainFetch = document.getElementsByTagName(x(mainElement));

utils.dump(mainFetch);

});

casper.run();
getElementsAttribute
为我提供了正确的信息,但是
getElementsByTagName
仅为我提供了一个“未定义”或空的结果,即使在我处理内部内容时也是如此。(这个.getElementsByTagName似乎不起作用)


基本上,我希望在缺少超链接的实例中获取文本,并使用单个XPath选择器将其推入与mainLinks和maintTitle大小/顺序相同的数组中。似乎应该有一个简单的方法来做到这一点,但我一直无法找到它。有人能给我指出正确的方向吗?

你可以尝试使用
mainFetch=this.getElementsInfo(x(mainlelement))
正如您在下面的输出片段中所看到的,它捕获了所有子项,html部分可用于筛选出不以“a”开头的子项
{
    "attributes": {},
    "height": 54,
    "html": "<a href=\"/wiki/CPO_Command_Identification_Badge\" title=\"CPO Command Identification Badge\" class=\"mw-redirect\">Chief Petty Officer Command Identification Badges</a>",
    "nodeName": "p",
    "tag": "<p><a href=\"/wiki/CPO_Command_Identification_Badge\" title=\"CPO Command Identification Badge\" class=\"mw-redirect\">Chief Petty Officer Command Identification Badges</a></p>",
    "text": "Chief Petty Officer Command Identification Badges",
    "visible": true,
    "width": 147,
    "x": 185,
    "y": 9302
},
{
    "attributes": {},
    "height": 18,
    "html": "Law Enforcement Badges",
    "nodeName": "p",
    "tag": "<p>Law Enforcement Badges</p>",
    "text": "Law Enforcement Badges",
    "visible": true,
    "width": 147,
    "x": 185,
    "y": 9526
}
{
“属性”:{},
“高度”:54,
“html”:“,
“nodeName”:“p”,
“标签”:“

”, “文字”:“总士官司令部识别徽章”, “可见”:真实, “宽度”:147, “x”:185, “y”:9302 }, { “属性”:{}, “高度”:18, “html”:“执法徽章”, “nodeName”:“p”, “标签”:“执法徽章”

”, “文字”:“执法徽章”, “可见”:真实, “宽度”:147, “x”:185, “y”:9526 }
谢谢,我想这可能有用。我试用后会给你回复的!它现在工作得更顺利了——不过,对于过滤
html
数据以获得
href
属性的最佳方法,您会推荐什么?