从R中的.kml中提取CDATA标记值
我想使用R从.kml文件中提取用于描述的值 文件如下:从R中的.kml中提取CDATA标记值,r,kml,R,Kml,我想使用R从.kml文件中提取用于描述的值 文件如下: <?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:atom="http://www.w3.org/2005/Atom"> <Document> <open>1</o
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2"
xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<open>1</open>
<visibility>1</visibility>
<name><![CDATA[2013-07-06 4:18pm]]></name>
...
<Placemark>
<name><![CDATA[2013-07-06 4:18pm (Start)]]></name>
<description><![CDATA[]]></description>
<TimeStamp><when>2013-07-06T20:18:56.000Z</when></TimeStamp>
<styleUrl>#start</styleUrl>
<Point>
<coordinates>-78.353348,45.020615,340.29998779296875</coordinates>
</Point>
</Placemark>
<Placemark id="tour">
<name><![CDATA[2013-07-06 4:18pm]]></name>
<description><![CDATA[]]></description>
...
<gx:Track>
<when>2013-07-06T20:18:56.000Z</when>
<gx:coord>-78.353348 45.020615 340.29998779296875</gx:coord>
<when>2013-07-06T20:19:12.000Z</when>
<gx:coord>-78.353315 45.020644 340.29998779296875</gx:coord>
<when>2013-07-06T22:12:23.000Z</when>
<gx:coord>-78.353108 45.020736 342.29998779296875</gx:coord>
<ExtendedData>
...
<Placemark>
<name><![CDATA[2013-07-06 4:18pm (End)]]></name>
<description><![CDATA[Created by Google My Tracks on Android.
Name: 2013-07-06 4:18pm
Activity type: cycling
Description: -
Total distance: 49.62 km (30.8 mi)
Total time: 1:53:28
Moving time: 1:50:17
Average speed: 26.24 km/h (16.3 mi/h)
Average moving speed: 27.00 km/h (16.8 mi/h)
Max speed: 61.20 km/h (38.0 mi/h)
Average pace: 2.29 min/km (3.7 min/mi)
Average moving pace: 2.22 min/km (3.6 min/mi)
Fastest pace: 0.98 min/km (1.6 min/mi)
Max elevation: 406 m (1333 ft)
Min elevation: 265 m (868 ft)
Elevation gain: 690 m (2263 ft)
Max grade: 12 %
Min grade: -11 %
Recorded: 2013-07-06 4:18pm
]]></description>
...
</Placemark>
</Document>
</kml>
xmlToList给了我,我认为是NULL,因为CDATA标记意味着解析器没有处理以下内容:
xml <- xmlTreeParse("test1.kml", useInternalNodes=TRUE)
xmllist <- xmlToList(xml)
xmllist$Document$Placemark$description
[[1]]
NULL
xmldocJake Burkhead在评论中回答了这个问题。他的解决方案做到了这一点。我非常感激。以下是如何从.kml文件中提取文本:
> xml1 <- xmlTreeParse("2013-07-06 4-18pm.kml", useInternalNodes=TRUE)
> root <-xmlRoot(xml1)
> names(root[["Document"]])
open visibility name author Style Style Style Style
"open" "visibility" "name" "author" "Style" "Style" "Style" "Style"
Style Schema Placemark Placemark Placemark
"Style" "Schema" "Placemark" "Placemark" "Placemark"
> # note that I want the text in the third "Placemark" which is in position [13] so:
> xmlValue(root[["Document"]][[13]][["description"]])
[1] "Created by Google My Tracks on Android.\n\nName: 2013-07-06 4:18pm\nActivity type: cycling\nDescription: -\nTotal distance: 49.62 km (30.8 mi)\nTotal time: 1:53:28\nMoving time: 1:50:17\nAverage speed: 26.24 km/h (16.3 mi/h)\nAverage moving speed: 27.00 km/h (16.8 mi/h)\nMax speed: 61.20 km/h (38.0 mi/h)\nAverage pace: 2.29 min/km (3.7 min/mi)\nAverage moving pace: 2.22 min/km (3.6 min/mi)\nFastest pace: 0.98 min/km (1.6 min/mi)\nMax elevation: 406 m (1333 ft)\nMin elevation: 265 m (868 ft)\nElevation gain: 690 m (2263 ft)\nMax grade: 12 %\nMin grade: -11 %\nRecorded: 2013-07-06 4:18pm\n"
>xml1根名称(根[[“文档”]]
打开可见性名称作者样式
打开“可见性”“名称”“作者”“样式”“样式”“样式”“样式”
样式架构Placemark Placemark Placemark
样式“架构”Placemark“Placemark”Placemark“Placemark”
>#注意,我希望文本位于第三个“Placemark”(位置[13]),以便:
>xmlValue(根[[“文档”][[13][[“说明”]]
[1] “由谷歌在Android上创建的我的轨迹。\n\n名称:2013-07-06 4:18pm\n活动类型:骑自行车\n说明:-\n总距离:49.62公里(30.8英里)\n总时间:1:53:28\n移动时间:1:50:17\n平均速度:26.24公里/小时(16.3英里/小时)\n平均移动速度:27.00公里/小时(16.8英里/小时)\n最大速度:61.20公里/小时(38.0英里/小时)\n平均速度:2.29分钟/公里(3.7分钟/英里)\n平均移动速度:2.22分钟/公里(3.6分钟/英里)\n最快速度:0.98分钟/公里(1.6分钟/英里)\n最高海拔:406米(1333英尺)\n最低海拔:265米(868英尺)\n海拔增加:690米(2263英尺)\n最高坡度:12%\n最低坡度:-11%\n记录:2013-07-06 4:18pm\n“
我已经接受了答案,但我想我把完整的解决方案放在这里,以防它对其他人有帮助
非常感谢您的坚持,Jake。也感谢Ricardo和agstudy。解决此问题的一个好办法是使用xml2
包读入数据
# Instead of xmlTreeParse
read_xml("test1.kml", options = "NOCDATA")
然后,您只需使用xml\u text()
检索CDATA即可
您应该提供一个格式良好的XML.Link to.kml作为建议。这很有希望,但我不确定获取CDATA标记数据所需的语法。xmlValue(root[[“Document”][[“description”]])提供NA。xmlValue(root[[“Document”][[“CDATA”]])和xmlValue(root[[“Document”][[“CDATA”]])一样。@user172665您能显示root[[Document][[“description”]]的输出吗
?或者将整个格式良好的XML放在一起。现在听起来您似乎在不正确地查找节点。另外,请查看名称(根)
和名称根[[“描述”]]]
等。第二个描述应该是小写Jake:root[[Document”][[“描述”]]给出Na,而名称(根[[“描述”]]给出NULL)
z1 <- xpathApply(xml, "//description", xmlValue)
z1
list()
doc <- xmlTreeParse("test1.kml", useInternalNodes = TRUE)
root <-xmlRoot(doc)
xmlValue(root[["Document"]][["name"]])
R> xmlValue(root[["Document"]][["name"]])
[1] "2013-07-06 4:18pm"
> xml1 <- xmlTreeParse("2013-07-06 4-18pm.kml", useInternalNodes=TRUE)
> root <-xmlRoot(xml1)
> names(root[["Document"]])
open visibility name author Style Style Style Style
"open" "visibility" "name" "author" "Style" "Style" "Style" "Style"
Style Schema Placemark Placemark Placemark
"Style" "Schema" "Placemark" "Placemark" "Placemark"
> # note that I want the text in the third "Placemark" which is in position [13] so:
> xmlValue(root[["Document"]][[13]][["description"]])
[1] "Created by Google My Tracks on Android.\n\nName: 2013-07-06 4:18pm\nActivity type: cycling\nDescription: -\nTotal distance: 49.62 km (30.8 mi)\nTotal time: 1:53:28\nMoving time: 1:50:17\nAverage speed: 26.24 km/h (16.3 mi/h)\nAverage moving speed: 27.00 km/h (16.8 mi/h)\nMax speed: 61.20 km/h (38.0 mi/h)\nAverage pace: 2.29 min/km (3.7 min/mi)\nAverage moving pace: 2.22 min/km (3.6 min/mi)\nFastest pace: 0.98 min/km (1.6 min/mi)\nMax elevation: 406 m (1333 ft)\nMin elevation: 265 m (868 ft)\nElevation gain: 690 m (2263 ft)\nMax grade: 12 %\nMin grade: -11 %\nRecorded: 2013-07-06 4:18pm\n"
# Instead of xmlTreeParse
read_xml("test1.kml", options = "NOCDATA")
# Instead of xmllist$Document$Placemark$description
read_xml("test1.kml", options = "NOCDATA") %>%
xml_nodes("Placemark") %>%
xml_nodes("description") %>%
xml_text()