Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中的一个文件夹中对多个xml文件运行循环_R_Xml_Loops_Xml Parsing - Fatal编程技术网

在R中的一个文件夹中对多个xml文件运行循环

在R中的一个文件夹中对多个xml文件运行循环,r,xml,loops,xml-parsing,R,Xml,Loops,Xml Parsing,我正在尝试将此脚本作为循环函数运行,因为我在一个文件夹中有200多个文件,并且我正在尝试在最后生成一个CSV文件,列出我需要提取的所有数据 我已经尝试过在循环中运行这个的各种方法 每当我尝试这些不同的选项时,都会出现如下错误: 错误:XML内容似乎不是XML或权限被拒绝 但是,当我只选择一个文件运行代码时,它运行良好。只有当我尝试将其转换为单个文件夹中多个文件的循环函数时,才会出现这些错误 以下是用于单个文件的原始代码: doc<-xmlParse("//file/path/3246000

我正在尝试将此脚本作为循环函数运行,因为我在一个文件夹中有200多个文件,并且我正在尝试在最后生成一个CSV文件,列出我需要提取的所有数据

我已经尝试过在循环中运行这个的各种方法

每当我尝试这些不同的选项时,都会出现如下错误:

错误:XML内容似乎不是XML或权限被拒绝

但是,当我只选择一个文件运行代码时,它运行良好。只有当我尝试将其转换为单个文件夹中多个文件的循环函数时,才会出现这些错误

以下是用于单个文件的原始代码:

doc<-xmlParse("//file/path/32460004.xml")
xmldf <- xmlToDataFrame(nodes = getNodeSet(doc, "//BatRecord"))

df1 <- data.frame(xmldf)
df1 <- separate(df1, xmldf.DateTime, into = c("Date", "Time"), sep = " ")
df1$Lat <- substr(xmldf$GPS,4,12)
df1$Long <- substr(xmldf$GPS,13,25)
df_final <- data.frame(df1$xmldf.Filename, df1$Date, df1$Time, df1$xmldf.Duration, df1$xmldf.Temperature, df1$Lat, df1$Long)
colnames(df_final) <- c("Filename", "Date", "Time", "Call Duration", "Temperature", "Lat", "Long")

write.csv(df_final, "//file/path/test_file.csv")

这应该使用
tidyverse
xml2
来实现

require(tidyverse)
require(xml2)

### Put all your xml files in a vector
my_files <- list.files("path/to/your/xml/files", full.names = TRUE)

### Read function to transform them to tibble (similar to data.frame)
read_my_xml <- function(x, path = "//BatRecord") {
  tmp <- read_xml(x) # read the xml file
  tmp <- tmp %>% 
    xml_find_first(path) %>% # select the //BatRecord node
    xml_children # select all children of that node

  # this extracts the text of all children 
  # aka the text between the > TEXT </ Tags
  out <- tmp %>% xml_text 
  # Takes the names of the tags <NAME> ... </NAME>
  names(out) <- tmp %>% xml_name
  # Turns out to tibble - see https://stackoverflow.com/q/40036207/3301344
  bind_rows(out)
}

### Read the files as data

dat <- map_df(my_files, read_my_xml) # map_df is similar to a loop + binding it to one tibble

### To the transformation

dat %>% 
  separate(DateTime, into = c("Date", "Time"), sep = " ") %>% 
  mutate(Lat = substr(GPS,4,12), Long = substr(GPS,13,25)) %>% 
  write_csv("wherever/you/want/file.txt")
require(tidyverse)
require(xml2)
###将所有xml文件放在一个向量中
我的_文件%
编写_csv(“where/you/want/file.txt”)

这太棒了。它起作用了。非常感谢。有没有可能解释一下将它们转换为TIBLE的函数的代码实际上在做什么?添加了一个逐步的解释-见上文
require(tidyverse)
require(xml2)

### Put all your xml files in a vector
my_files <- list.files("path/to/your/xml/files", full.names = TRUE)

### Read function to transform them to tibble (similar to data.frame)
read_my_xml <- function(x, path = "//BatRecord") {
  tmp <- read_xml(x) # read the xml file
  tmp <- tmp %>% 
    xml_find_first(path) %>% # select the //BatRecord node
    xml_children # select all children of that node

  # this extracts the text of all children 
  # aka the text between the > TEXT </ Tags
  out <- tmp %>% xml_text 
  # Takes the names of the tags <NAME> ... </NAME>
  names(out) <- tmp %>% xml_name
  # Turns out to tibble - see https://stackoverflow.com/q/40036207/3301344
  bind_rows(out)
}

### Read the files as data

dat <- map_df(my_files, read_my_xml) # map_df is similar to a loop + binding it to one tibble

### To the transformation

dat %>% 
  separate(DateTime, into = c("Date", "Time"), sep = " ") %>% 
  mutate(Lat = substr(GPS,4,12), Long = substr(GPS,13,25)) %>% 
  write_csv("wherever/you/want/file.txt")