Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用XMLStarlet访问Bash中HTML标记的内容_Html_Xml_Bash_Xmlstarlet - Fatal编程技术网

如何使用XMLStarlet访问Bash中HTML标记的内容

如何使用XMLStarlet访问Bash中HTML标记的内容,html,xml,bash,xmlstarlet,Html,Xml,Bash,Xmlstarlet,我试图学习如何使用XMLStarlet访问Bash中HTML标记的内容。例如,我试图访问www.wisdomofchopra.com/iframe.php页面中的一些文本。我在为XMLStarlet指定HTML内容的“地址”时遇到了一些困难,希望能得到一些帮助。我的代码尝试如下: URL="http://www.wisdomofchopra.com/iframe.php" webPage="$(curl -s "${URL}")" echo "${webPage}" | xmlstarlet s

我试图学习如何使用XMLStarlet访问Bash中HTML标记的内容。例如,我试图访问www.wisdomofchopra.com/iframe.php页面中的一些文本。我在为XMLStarlet指定HTML内容的“地址”时遇到了一些困难,希望能得到一些帮助。我的代码尝试如下:

URL="http://www.wisdomofchopra.com/iframe.php"
webPage="$(curl -s "${URL}")"
echo "${webPage}" | xmlstarlet sel -T -t -c "//html/body//table/tr/td[@id='quote']/header/h2/"
这将产生以下输出:

-:29.12: Opening and ending tag mismatch: meta line 5 and head
    </head>
           ^
-:35.100: Entity 'nbsp' not defined
te"><header><h2>&quot;Emotional intelligence is beyond total reality&quot;&nbsp;
                                                                               ^
-:35.106: Entity 'nbsp' not defined
eader><h2>&quot;Emotional intelligence is beyond total reality&quot;&nbsp;&nbsp;
                                                                               ^
-:41.119: EntityRef: expecting ';'
witter.com/intent/tweet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via
                                                                               ^
-:41.139: EntityRef: expecting ';'
eet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via=WisdomOfChopra&text
                                                                               ^
-:41.196: EntityRef: expecting ';'
via=WisdomOfChopra&text=%27Emotional+intelligence+is+beyond+total+reality%27&url
                                                                               ^
-:52.169: EntityRef: expecting ';'
));document.write(' src="http://ads.adbrite.com/mb/text_group.php?sid=2171164&zs
                                                                               ^
-:52.186: EntityRef: expecting ';'
(' src="http://ads.adbrite.com/mb/text_group.php?sid=2171164&zs=3436385f3630&ifr
                                                                               ^
-:52.209: EntityRef: expecting ';'
ite.com/mb/text_group.php?sid=2171164&zs=3436385f3630&ifr='+AdBrite_Iframe+'&ref
                                                                               ^
-:53.99: EntityRef: expecting ';'
p" href="http://www.adbrite.com/mb/commerce/purchase_form.php?opid=2171164&afsid
                                                                               ^
-:57.9: Opening and ending tag mismatch: head line 3 and html
</html>
       ^
-:58.1: Premature end of data in tag html line 2
-:29.12:开头和结尾标记不匹配:元行5和头
^
-:35.100:未定义实体“nbsp”
te“>“情商超越了全部现实”
^
-:35.106:未定义实体“nbsp”
eader>“情商超越了全部现实”
^
-:41.119:EntityRef:应为“;”
witter.com/intent/tweet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via
^
-:41.139:EntityRef:应为“;”
eet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via=wisdomofchopra&text
^
-:41.196:EntityRef:应为“;”
via=WisdomOfChopra&text=%27情感+智力+超越+总体+现实%27&url
^
-:52.169:EntityRef:应为“;”
));document.write('src='http://ads.adbrite.com/mb/text_group.php?sid=2171164&zs
^
-:52.186:EntityRef:应为“;”
('src='http://ads.adbrite.com/mb/text_group.php?sid=2171164&zs=3436385f3630&ifr
^
-:52.209:EntityRef:应为“;”
ite.com/mb/text_group.php?sid=2171164&zs=3436385f3630&ifr='+AdBrite_Iframe+'&ref
^
-:53.99:EntityRef:应为“;”
p“href=”http://www.adbrite.com/mb/commerce/purchase_form.php?opid=2171164&afsid
^
-:57.9:开头和结尾标记不匹配:头行3和html
^
-:58.1:标记html第2行中的数据过早结束

编辑:为方便起见,下面是一些大致相当于网页的HTML代码:

<!DOCTYPE html>
<html>
    <head>
    </head>
    <body>
        <h3>Your random fictional Deepak Chopra quote:</h3>
        <table border="0" cellspacing="0" cellpadding="0">
            <tr>
                <td width="128" align="left" valign="top"><img src="img/imageSmall2.png" width="80" height="80" /></td>
                <td id="quote"><header><h2>&quot;Perceptual reality serves total truth&quot;&nbsp;&nbsp;</h2></header></td>
            </tr>
        </table>
    </body>
</html>

你随机虚构的Deepak Chopra名言:
“感性现实服务于全部真理”

我无法让XMLStarlet处理HTML,所以我只是使用grep和AWK:

printDeepakChopraAdvice(){
    URL="http://www.wisdomofchopra.com/iframe.php"
    webPage="$(curl -s "${URL}")"
    text="$(echo "${webPage}" | grep "id=\"quote\"" | awk -F"&quot;" '{print $2}')"
    echo "${text}"
}

谢谢你的建议。我不认为这是问题所在。我创建了一个精简的代码,在层次结构上与示例网页类似,但仍然遇到类似的问题。我已将此精简版本添加到帖子文本中。