Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cocoa/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Xquery 如何使用html链接提取文本?_Xquery_Basex - Fatal编程技术网

Xquery 如何使用html链接提取文本?

Xquery 如何使用html链接提取文本?,xquery,basex,Xquery,Basex,我尝试使用BaseX解析HTML页面。 从代码的这一部分: <td colspan="2" rowspan="1" class="light comment2 last2"> <img class="textalign10" src="templates/comment10.png" alt="*" width="10" height="10" border="0"/> <a shape="rect" href="mypage.php?us

我尝试使用BaseX解析HTML页面。 从代码的这一部分:

 <td colspan="2" rowspan="1" class="light comment2 last2">
  <img class="textalign10" src="templates/comment10.png" 
       alt="*" width="10" height="10" border="0"/>
  <a shape="rect" href="mypage.php?userid=26682">user</a>
  : the text I'd like to keep [<a shape="rect" 
  href="http://alink" rel="nofollow">Link</a>] . with that part too.
 </td>
使用此功能

declare
 function gkm:node_message_from_comment($comment as item()*) {
  if ($comment) then
    copy $c := $comment
    modify (
      delete node $c/img[1],
      delete node $c/a[1],
      delete node $c/@*,
      rename node $c as 'message'
    )
    return $c
  else ()
};
我可以提取文本,但未能从开头删除
。 即:


:我想保留的文本[]。还有那部分。

对我来说,使用XQuery更新和转换语句似乎有点过于复杂。您还可以选择
mypage.php
链接后面的节点;有了更多关于输入的知识,可能还有更好的方法来选择所需的节点

要剪切
子字符串,请在
之后使用
子字符串。如果您坚持使用transform语句,则在使用transform语句时,“从第一个结果节点切断
,并按原样返回所有其他节点”模式也适用

let $comment :=<td colspan="2" rowspan="1" class="light comment2 last2">
  <img class="textalign10" src="templates/comment10.png" alt="*" width="10" height="10" border="0"/>
  <a shape="rect" href="mypage.php?userid=26682">user</a>
  : the text I'd like to keep [<a shape="rect" href="http://alink" rel="nofollow">Link</a>] . with that part too.
 </td>
let $result := $comment/a[starts-with(@href, 'mypage.php')]/following-sibling::node()
return <message>{
  $result[1]/substring-after(., ': '),
  $result[position() > 1]
}</message>
<message>
: the text I'd like to keep [<a shape="rect" href="http://alink" rel="nofollow">Link</a>] . with that part too.
</message>
let $comment :=<td colspan="2" rowspan="1" class="light comment2 last2">
  <img class="textalign10" src="templates/comment10.png" alt="*" width="10" height="10" border="0"/>
  <a shape="rect" href="mypage.php?userid=26682">user</a>
  : the text I'd like to keep [<a shape="rect" href="http://alink" rel="nofollow">Link</a>] . with that part too.
 </td>
let $result := $comment/a[starts-with(@href, 'mypage.php')]/following-sibling::node()
return <message>{
  $result[1]/substring-after(., ': '),
  $result[position() > 1]
}</message>
return <message>{
  head($result)/substring-after(., ': '),
  tail($result)
}</message>