使用R下载和读取压缩的xml文件_Xml_R_Zip

使用R下载和读取压缩的xml文件

xml r

使用R下载和读取压缩的xml文件,xml,r,zip,Xml,R,Zip,根据Dirk Eddelbuettel的回答，我试图从zip存档中读取xml文件，以便进一步处理。除了URL和文件名之外，对引用代码的唯一更改是我将read.table更改为xmlInternalTreeParse library(XML) temp <- tempfile() download.file("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downf

根据Dirk Eddelbuettel的回答，我试图从

zip

存档中读取

xml

文件，以便进一步处理。除了URL和文件名之外，对引用代码的唯一更改是我将

read.table

更改为

xmlInternalTreeParse

library(XML)
temp <- tempfile()
download.file("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data%2Fnrg_105a.sdmx.zip",temp)
doc <- xmlInternalTreeParse(unz(temp, "nrg_105a.dsd.xml"))
fileunlink(temp)
closeAllConnections()

traceback（）

显示这是来自解析器内部的函数调用。因此，在这种情况下，temp似乎是一个不恰当的引用。有什么办法可以让这一切顺利进行吗？

您可以试试：

# Make a temporary file (tf) and a temporary folder (tdir)
tf <- tempfile(tmpdir = tdir <- tempdir())

## Download the zip file 
download.file("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data%2Fnrg_105a.sdmx.zip", tf)

## Unzip it in the temp folder
xml_files <- unzip(tf, exdir = tdir)

## Parse the first file
doc <- xmlInternalTreeParse(xml_files[1])

## Delete temporary files
unlink(tdir, T, T)

#创建临时文件（tf）和临时文件夹（tdir）
tfxmlInternalTreeParse
的工作方式似乎与read.table
的工作方式不同。虽然read.table
可以接受连接对象，但根据文档，xmlInternalTreeParse
需要一个文件名（作为字符）。嗯，我从未真正理解什么是连接。因此，我可能需要将连接转换为具有readLines
或类似内容的字符向量。仔细检查后，我发现这两种代码的功能基本相同，但您使用的是unzip
而不是unz
。使用前者也会使原始脚本运行。问题是xmlInternalTreeParse
需要一个文件名，而不是一个连接（unz

返回的内容）。是的，您是对的，但它会将提取的xml保存在当前目录中。

# Make a temporary file (tf) and a temporary folder (tdir)
tf <- tempfile(tmpdir = tdir <- tempdir())

## Download the zip file 
download.file("http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data%2Fnrg_105a.sdmx.zip", tf)

## Unzip it in the temp folder
xml_files <- unzip(tf, exdir = tdir)

## Parse the first file
doc <- xmlInternalTreeParse(xml_files[1])

## Delete temporary files
unlink(tdir, T, T)