如何在一列链接中搜索r中的字符串匹配项？_R_Paste_Grepl

如何在一列链接中搜索r中的字符串匹配项？

如何在一列链接中搜索r中的字符串匹配项？,r,paste,grepl,R,Paste,Grepl,我有一个数据表，在同一列中有一个.txt链接列表。我正在寻找R在每个链接中搜索的方法，以查看文件是否包含字符串“折扣率”或“折扣现金流”。然后，我希望R在每个链接旁边创建两列（一列表示贴现率，一列表示贴现现金流），如果存在，则其中包含1，如果不存在，则包含0 下面是我想筛选的一小部分示例链接： http://www.sec.gov/Archives/edgar/data/1015328/0000913849-04-000510.txt http://www.sec.gov/Archives/

我有一个数据表，在同一列中有一个.txt链接列表。我正在寻找R在每个链接中搜索的方法，以查看文件是否包含字符串“折扣率”或“折扣现金流”。然后，我希望R在每个链接旁边创建两列（一列表示贴现率，一列表示贴现现金流），如果存在，则其中包含1，如果不存在，则包含0

下面是我想筛选的一小部分示例链接：

http://www.sec.gov/Archives/edgar/data/1015328/0000913849-04-000510.txt
http://www.sec.gov/Archives/edgar/data/1460306/0001460306-09-000001.txt
http://www.sec.gov/Archives/edgar/data/1063761/0001047469-04-028294.txt
http://www.sec.gov/Archives/edgar/data/1230588/0001178913-09-000260.txt
http://www.sec.gov/Archives/edgar/data/1288246/0001193125-04-155851.txt
http://www.sec.gov/Archives/edgar/data/1436866/0001172661-09-000349.txt
http://www.sec.gov/Archives/edgar/data/1089044/0001047469-04-026535.txt
http://www.sec.gov/Archives/edgar/data/1274057/0001047469-04-023386.txt
http://www.sec.gov/Archives/edgar/data/1300379/0001047469-04-026642.txt
http://www.sec.gov/Archives/edgar/data/1402440/0001225208-09-007496.txt
http://www.sec.gov/Archives/edgar/data/35527/0001193125-04-161618.txt

也许是这样的

checktext <- function(file, text) {
  filecontents <- readLines(file)
  return(as.numeric(any(grepl(text, filecontents, ignore.case = TRUE))))
}

df$DR <- sapply(df$file_name, checktext, "discount rate")
df$DCF <- sapply(df$file_name, checktext, "discounted cash flow")

checktext可能是这样的
checktext <- function(file, text) {
  filecontents <- readLines(file)
  return(as.numeric(any(grepl(text, filecontents, ignore.case = TRUE))))
}

df$DR <- sapply(df$file_name, checktext, "discount rate")
df$DCF <- sapply(df$file_name, checktext, "discounted cash flow")

checktextdput（）
>imgsdput（）
>imgs连接和读取文件会非常慢，但grep会很快。使用一次读取每个文件并对其使用两次grep
会更有效。让text
成为checktext
函数中的一个向量，并使用类似sapply（text，function（x）as.numeric（any（grepl（x，filecontents，ignore.case=T）））
@Gregor Yes-这会快得多-非常感谢。我已将其添加到主答案中。连接和读取文件将非常慢，但grep将非常快。使用一次读取每个文件并对其使用两次grep
会更有效。让text
成为checktext
函数中的一个向量，并使用类似sapply（text，function（x）as.numeric（any（grepl（x，filecontents，ignore.case=T）））
@Gregor Yes-这会快得多-非常感谢。我已将其添加到主要答案中。