rvest故障排除
我有以下html字符串:rvest故障排除,r,xml,rvest,R,Xml,Rvest,我有以下html字符串: html <- '<head> <dd> "Line 1 : abc Line 2 : def Line 3 : ghi Line 4 : jkl Line 5 : mno" </dd> </head>' 但是我只想从第4行中提取文本。因此,我更新了Xpath并保持代码的其余部分不变: xpath <- 'subs
html <- '<head>
<dd>
"Line 1 : abc
Line 2 : def
Line 3 : ghi
Line 4 : jkl
Line 5 : mno"
</dd>
</head>'
但是我只想从第4行中提取文本。因此,我更新了Xpath并保持代码的其余部分不变:
xpath <- 'substring-after(substring-after(substring-before(//dd, "Line 5"), "Line 3"), "\n")'
html %>%
XML::htmlParse(., asText=TRUE) %>%
XML::xpathSApply(., path = xpath, xmlValue)
html %>%
xml2::read_html() %>%
rvest::html_nodes(xpath=xpath) %>%
rvest::html_text()
如何使用rvest获得“第4行:jkl\n”的预期结果?我建议您将整个文本/语料库带进来。将HTML表转换为TIBLE。我建议你把全文/语料库都带进来。将HTML表转换为TIBLE。只有到那时,我才担心要拔出4号线。
'\n "Line 1 : abc\n Line 2 : def\n Line 3 : ghi\n Line 4 : jkl\n Line 5 : mno"\n '
'\n "Line 1 : abc\n Line 2 : def\n Line 3 : ghi\n Line 4 : jkl\n Line 5 : mno"\n '
xpath <- 'substring-after(substring-after(substring-before(//dd, "Line 5"), "Line 3"), "\n")'
html %>%
XML::htmlParse(., asText=TRUE) %>%
XML::xpathSApply(., path = xpath, xmlValue)
html %>%
xml2::read_html() %>%
rvest::html_nodes(xpath=xpath) %>%
rvest::html_text()
' Line 4 : jkl\n '
Error in nodes_duplicated(nodes): Expecting an external pointer: [type=character].