For循环,带有来自XML包的readHTMLTable

For循环,带有来自XML包的readHTMLTable,r,xml,for-loop,R,Xml,For Loop,我正在尝试使用for循环从多个URL提取数据。问题是,我需要的数据可以在不同的表中找到。我最初的问题是。我掌握的初步数据: Code Issuer ISIN Type URL 1 NTK007_1915 NBRK KZW1KD079153 discount notes http://www.kase.kz/en/gsecs/show/NTK007_1915 2 NTK007

我正在尝试使用
for
循环从多个URL提取数据。问题是,我需要的数据可以在不同的表中找到。我最初的问题是。我掌握的初步数据:

 Code Issuer         ISIN           Type                                          URL
1 NTK007_1915   NBRK KZW1KD079153 discount notes http://www.kase.kz/en/gsecs/show/NTK007_1915
2 NTK007_1917   NBRK KZW1KD079179 discount notes http://www.kase.kz/en/gsecs/show/NTK007_1917
3 NTK007_1918   NBRK KZW1KD079187 discount notes http://www.kase.kz/en/gsecs/show/NTK007_1918
4 NTK028_1896   NBRK KZW1KD288960 discount notes http://www.kase.kz/en/gsecs/show/NTK028_1896
5 NTK028_1903   NBRK KZW1KD289034 discount notes http://www.kase.kz/en/gsecs/show/NTK028_1903
6 NTK028_1909   NBRK KZW1KD289091 discount notes http://www.kase.kz/en/gsecs/show/NTK028_1909
我一直在尝试以下代码:

wanted <- c("Nominal value in issue's currency" = "Nominal Value",
            "Number of bonds outstanding" = "# of Bonds Issue")

# function returns a data frame of wanted columns for given URL
getValues1 <- function (name, url) {
  # get the table and rename columns
  sp = readHTMLTable(url, stringsAsFactors = FALSE)
  df <- sp[[4]]
  names(df) <- c("full_name", "value")

  # filter and remap wanted columns
  result <- df[df$full_name %in% names(wanted),]
  result$column_name <- sapply(result$full_name, function(x) {wanted[[x]]})

  # add the identifier to every row
  result$name <- name
  return (result[,c("name", "column_name", "value")])
}

getValues2 <- function (name, url) {
  # get the table and rename columns
  sp = readHTMLTable(url, stringsAsFactors = FALSE)
  df <- sp[[7]]
  names(df) <- c("full_name", "value")

  # filter and remap wanted columns
  result <- df[df$full_name %in% names(wanted),]
  result$column_name <- sapply(result$full_name, function(x) {wanted[[x]]})

  # add the identifier to every row
  result$name <- name
  return (result[,c("name", "column_name", "value")])
}

# invoke function for each name/URL pair - returns list of data frames
for (i in 1:length(newd$URL)) {
    sp = readHTMLTable(newd$URL[[i]])
    if (dim(sp[[4]])[[2]] = 2) {
        columns = getValues1(x[["name"]], x[["URL"]])
    } else {
        columns = getValues2(x[["name"]], x[["URL"]])
    }
print (columns)
}

请提供帮助。

最好使用CSS或XPath选择器故意选择所需的表。否则,你就不得不求助于控制流体操,你仍然有很好的机会不小心拉到你不想要的东西。另外,将
for
循环更改为
lappy
将更自然地为您提供一个整洁的列表。如果您能看一下这两页,我将不胜感激:我想获得“特征”标签下的数据。这两个页面的CSS和/或XPath代码是什么?仍然在为selectorgadget苦苦挣扎……啊,我想我终于找到了:
html\u节点(“\code>main>div.right>div.float-wrapper-right>div>table.content table.top”)%%>%html\u table()
你可以使用
h%>%html\u节点('table.top')%%>%html\u table()
其中
h
是解析过的页面。但是它不能处理中间的标题行,所以以后必须拆分它。
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’