R 在循环中使用pdftools时的错误处理_R_Error Handling_Try Catch_Pdftools

R 在循环中使用pdftools时的错误处理

r error-handling

R 在循环中使用pdftools时的错误处理,r,error-handling,try-catch,pdftools,R,Error Handling,Try Catch,Pdftools,我试图从多个pdf文件中提取某些表格，但并非所有文件都有该表格。即使第一个文件不包含特定的表，如何使用trycatch或类似工具跳过并继续下一个文件 library(pdftools) library(tidyverse) url <- c("https://www.computershare.com/News/Annual%20Report%202019.pdf?2", "https://www.annualreports.com/Ho

我试图从多个pdf文件中提取某些表格，但并非所有文件都有该表格。即使第一个文件不包含特定的表，如何使用trycatch或类似工具跳过并继续下一个文件

library(pdftools)
library(tidyverse)

url <- c("https://www.computershare.com/News/Annual%20Report%202019.pdf?2",
         "https://www.annualreports.com/HostedData/AnnualReportArchive/a/LSE_ASOS_2018.PDF")

raw_text <- map(url, pdf_text)

clean_table1 <- function(raw) {
  
  raw <- map(raw, ~ str_split(.x, "\\n") %>% unlist())
  raw <- reduce(raw, c)
  
  table_start <- stringr::str_which(tolower(raw), "twenty largest shareholders")
  table_end <- stringr::str_which(tolower(raw), "total")
  table_end <- table_end[min(which(table_end > table_start))]
  
  table <- raw[(table_start + 3 ):(table_start + 25)]
  table <- str_replace_all(table, "\\s{2,}", "|")
  text_con <- textConnection(table)
  data_table <- read.csv(text_con, sep = "|")
  #colnames(data_table) <- c("Name", "Number of Shares", "Percentage")
  data_table
}

shares <- map_df(raw_text, clean_table1)

您可以检查

table\u start

的

length

和

return

NULL

是否为0，因此在使用

map\u df

时，这些记录将自动折叠，并且您将拥有一个组合数据帧

library(tidyverse)

clean_table1 <- function(raw) {
  
  raw <- map(raw, ~ str_split(.x, "\\n") %>% unlist())
  raw <- reduce(raw, c)
  
  table_start <- stringr::str_which(tolower(raw), "twenty largest shareholders")
  if(!length(table_start)) return(NULL)
  table_end <- stringr::str_which(tolower(raw), "total")
  table_end <- table_end[min(which(table_end > table_start))]
  
  table <- raw[(table_start + 3 ):(table_start + 25)]
  table <- str_replace_all(table, "\\s{2,}", "|")
  text_con <- textConnection(table)
  data_table <- read.csv(text_con, sep = "|")
  #colnames(data_table) <- c("Name", "Number of Shares", "Percentage")
  data_table
}

shares <- map_df(raw_text, clean_table1)

库（tidyverse）
清理表格1您可以检查表格的长度
开始
和返回
空
如果为0，那么在使用映射df
时，这些记录将自动折叠，并且您将拥有一个组合数据帧
library(tidyverse)

clean_table1 <- function(raw) {
  
  raw <- map(raw, ~ str_split(.x, "\\n") %>% unlist())
  raw <- reduce(raw, c)
  
  table_start <- stringr::str_which(tolower(raw), "twenty largest shareholders")
  if(!length(table_start)) return(NULL)
  table_end <- stringr::str_which(tolower(raw), "total")
  table_end <- table_end[min(which(table_end > table_start))]
  
  table <- raw[(table_start + 3 ):(table_start + 25)]
  table <- str_replace_all(table, "\\s{2,}", "|")
  text_con <- textConnection(table)
  data_table <- read.csv(text_con, sep = "|")
  #colnames(data_table) <- c("Name", "Number of Shares", "Percentage")
  data_table
}

shares <- map_df(raw_text, clean_table1)

库（tidyverse）
clean_table 1错误发生在哪里？@r2evans在（table_start+3）中的表提取过程中错误：（table_start+25）：长度为0的参数另外：警告消息：以min为单位（哪一个（table_end>table_start））：min没有未丢失的参数；返回Inf
如果（！length（table_start）和&！length（table_end））返回（）紧接在表之前，错误发生在哪里？@r2evans在表提取过程中在（表\u start+3）中出错：（表\u start+25）：长度为0的参数另外：警告消息：在min中（其中（表\u end>表\u start））：没有到min的非缺失参数；返回Inf
如果（！length（table_start）和&！length（table_end））返回（）紧接在表之前这似乎有效，但当我在其他文件上尝试它时，我在read.table中得到了这个错误错误（文件＝文件，页眉＝页眉，SEP＝SEP，Que=＝Que:比列名称< /代码>更多的列。我假定一些文件有更多的列的表格。我如何对此进行错误处理？很难在不查看数据的情况下进行判断。但是，我有两个猜测，我们可以考虑。只读3列<代码>数据表3）返回。（NULL）
。这似乎有效，但当我在其他文件上尝试它时，在read.table中出现了此错误（文件＝文件，页眉＝页眉，SEP＝SEP，Que=＝Que:比列名称< /代码>更多的列。我假定一些文件有更多的列的表格。我如何对此进行错误处理？很难在不查看数据的情况下进行判断。但是，我有两个猜测，我们可以考虑。只读3列<代码>数据表3）返回。（空）
。