rvest故障排除

rvest故障排除,r,xml,rvest,R,Xml,Rvest,我有以下html字符串: html <- '<head> <dd> "Line 1 : abc Line 2 : def Line 3 : ghi Line 4 : jkl Line 5 : mno" </dd> </head>' 但是我只想从第4行中提取文本。因此,我更新了Xpath并保持代码的其余部分不变: xpath <- 'subs

我有以下html字符串:

html <- '<head>
    <dd>
        "Line 1 : abc
        Line 2 : def
        Line 3 : ghi
        Line 4 : jkl
        Line 5 : mno"
    </dd>
</head>'
但是我只想从第4行中提取文本。因此,我更新了Xpath并保持代码的其余部分不变:

xpath <- 'substring-after(substring-after(substring-before(//dd, "Line 5"), "Line 3"), "\n")'

html %>% 
    XML::htmlParse(., asText=TRUE) %>% 
    XML::xpathSApply(., path = xpath, xmlValue)


html %>% 
    xml2::read_html() %>%
    rvest::html_nodes(xpath=xpath) %>%
    rvest::html_text()

如何使用rvest获得“第4行:jkl\n”的预期结果?

我建议您将整个文本/语料库带进来。将HTML表转换为TIBLE。我建议你把全文/语料库都带进来。将HTML表转换为TIBLE。只有到那时,我才担心要拔出4号线。
'\n        "Line 1 : abc\n        Line 2 : def\n        Line 3 : ghi\n        Line 4 : jkl\n        Line 5 : mno"\n    '
'\n        "Line 1 : abc\n        Line 2 : def\n        Line 3 : ghi\n        Line 4 : jkl\n        Line 5 : mno"\n    '
xpath <- 'substring-after(substring-after(substring-before(//dd, "Line 5"), "Line 3"), "\n")'

html %>% 
    XML::htmlParse(., asText=TRUE) %>% 
    XML::xpathSApply(., path = xpath, xmlValue)


html %>% 
    xml2::read_html() %>%
    rvest::html_nodes(xpath=xpath) %>%
    rvest::html_text()
'        Line 4 : jkl\n        '
Error in nodes_duplicated(nodes): Expecting an external pointer: [type=character].