R 如果最后一个元素满足条件,XPath将从中排除文本

R 如果最后一个元素满足条件,XPath将从中排除文本,r,xpath,R,Xpath,目标是排除包含在标记中最后一个元素中的文本 因此,如果文本位于最后一个标记中,并且在和标记之间没有其他文本,以及在和之间没有其他文本,则需要抓取除文本以外的所有内容,如上面的示例所示 我正在使用 //p[normalize-space()] 但它返回所有内容,包括最后一个标记的文本…: 而最后一句应该被排除在外 谢谢你的提示 UPD 示例1。如果下一个文本位于最后一个文本中,则应返回下一个文本(因为并非所有文本都在其中: 有一次,我在北方邦参加一个巡回环卫嘉年华,突然有人冲过来对我说:“罗斯!

目标是排除包含在标记中最后一个元素中的文本

因此,如果文本位于最后一个标记中,并且在和标记之间没有其他文本,以及在和之间没有其他文本,则需要抓取除文本以外的所有内容,如上面的示例所示

我正在使用

//p[normalize-space()]
但它返回所有内容,包括最后一个标记的文本…:

而最后一句应该被排除在外

谢谢你的提示

UPD

示例1。如果下一个文本位于最后一个文本中,则应返回下一个文本(因为并非所有文本都在其中:

有一次,我在北方邦参加一个巡回环卫嘉年华,突然有人冲过来对我说:“罗斯!罗斯!真是一个‘没有厕所,没有我做’!”如果这就是Ek Prem Katha的故事,那么我完全赞成。但因为在世界上还有24亿人没有厕所的情况下,把厕所带到世界上,无论如何算是一个非常幸福的结局

示例2。如果下一个文本位于最后一个,则不应返回下一个文本(因为所有文本都位于内部):

Sophie Elmhirst是NS的助理编辑


目前还不清楚具体的规则是什么,但这里有一个建议,您可以对此发表评论:

//p[normalize-space() and not(position() = last() and em)]
也就是说

//p                           find all `p` elements anywhere in the document
[normalize-space()            but only if the contain at least 1 character that is not a white-space
and not(position() = last()   and only if the `p` element is not the last `p` child of its parent
and em)]                      and only if the `p` element has no child named `em`
并作为结果返回(单个结果由
----
分隔):


抱歉,我可能误解了您的意思,直到现在。
not(text())
考虑到在
em

之外是否有属于
p
本身的子文本,因为您有R标记,是否有R代码需要帮助?
(/p)[not(position()=last()和em]
请显示一个完整的HTML文档,其中包含您必须处理的所有变化。显然,一些
p
元素是空的(这是使用
normalize-space()
谓词的唯一原因)例如,所有
p
元素是否都是兄弟元素,也就是说,它是否是一个平面结构?如果最后一个
p
元素不包含
em
,那么最后一个
p
元素如何?@MathiasMüller,该结构与这里给出的结构完全相同。最后一个
p
元素并不总是包含
em
。只有在以下情况下,我才需要从最后一个
p
元素中排除文本在
em
中包围的整个文本。也许
规范化空格
不是必需的,但我总是使用它。@splash58,你没有错过以下内容吗:?也许你是指smth,比如:
(//p)[not(position()=last()和following::em)]
?你对点的看法是正确的(
)在我的示例的最后一个节点中,它在
em
之外。我尝试了xpath
//p[normalize-space(),而不是(position()=last()和em)]
,如果点(.)在外面,它看起来不会返回文本。这是这种xpath的正确行为吗?@Alex是的,这是正确的。那么,规则到底是什么呢?a)如果所有文本内容都在
em
内,则仅返回最后位置的
p
元素,该元素具有
em
子元素;如果部分文本在
em
外,则返回该元素。哪一个对你是正确的?是的,b)选项是正确的。但是,像
[?!,.:“]
等符号可能位于
em
之外。我刚刚发现,如果在last
p
em
之外有文本,xpath不会返回文本。我更新了一个示例。
"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."
Liberals were mostly delighted by what the Washington Post called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".
This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.
Felicity Spector is a deputy programme editor for Channel 4 News
<p>I was once on a travelling sanitation carnival in Uttar Pradesh when someone rushed up to me. “Rose! Rose! There’s a real ‘no loo no I do’!” If that’s the story of <em>Ek Prem Katha</em>, then I’m all in favour.  But because bringing a toilet into the world, when there are still 2.4 billion people without one, is by any reckoning a very happy ending.</p>
<p><em>Sophie Elmhirst is an assistant editor of the NS</em></p>
//p[normalize-space() and not(position() = last() and em)]
//p                           find all `p` elements anywhere in the document
[normalize-space()            but only if the contain at least 1 character that is not a white-space
and not(position() = last()   and only if the `p` element is not the last `p` child of its parent
and em)]                      and only if the `p` element has no child named `em`
<p>"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."</p>
-----------------------
<p>Liberals were mostly delighted by what the <em>Washington Post</em> called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".</p>
-----------------------
<p>This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.</p>
//p[normalize-space() and not(position() = last() and em and not(text()))]