Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
rvest:从网页中删除表格_R_Rvest - Fatal编程技术网

rvest:从网页中删除表格

rvest:从网页中删除表格,r,rvest,R,Rvest,我正在尝试检索下表: 在……上找到 我使用以下代码成功地检索了引号: library('rvest') url.2 <- "https://www.wettportal.com/Fussball/Champions_League/Champions_League/Paris_Saint-Germain_-_Real_Madrid_2448367.html" webpage.2 <- read_html(url.2) oddscell.html <- html

我正在尝试检索下表:

在……上找到

我使用以下代码成功地检索了引号:

library('rvest')
url.2 <- "https://www.wettportal.com/Fussball/Champions_League/Champions_League/Paris_Saint-Germain_-_Real_Madrid_2448367.html"
webpage.2 <- read_html(url.2)
oddscell.html <- html_nodes(webpage.2, ".oddscell")
oddscell.data <- html_text(oddscell.html)
home <- oddscell.data[seq(1, length(oddscell.data), 3)]
draw <- oddscell.data[seq(2, length(oddscell.data), 3)]
away <- oddscell.data[seq(3, length(oddscell.data), 3)]

my.quotes <- cbind(home, draw, away)
我设法做了一些类似的事情,使用html_nodeswebpage检索赌注的名称。2、.bookie


我的问题是:有没有一种方法可以一次把表刮干净?

那个网站对我来说被屏蔽了!我看不到任何东西,但我可以告诉你,基本上,应该这样做

html_nodes函数将每个html标记转换为R数据框中的一行

library(rvest)

## Loading required package: xml2

# Define the url once.
URL <- "https://scistarter.com/finder?phrase=&lat=&lng=&activity=At%20the%20beach&topic=&search_filters=&search_audience=&page=1#view-projects"

    scistarter_html <- read_html(URL)
    scistarter_html

## {xml_document}
## <html class="no-js" lang="en">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset= ...
## [2] <body>\n    \n    \n    <svg style="position: absolute; width: 0; he ...
我们能够检索我们在浏览器中看到的相同HTML代码。这还没有用,但它确实表明我们能够检索到我们在浏览器中看到的相同HTML代码。现在,我们将开始过滤HTML以查找我们要查找的数据

我们需要的数据存储在一个表中,我们可以通过查看“Inspect Element”窗口来判断

这将获取其中包含链接的所有节点

    scistarter_html %>%
      html_nodes("a") %>%
      head()

## {xml_nodeset (6)}
## [1] <a href="/index.html" class="site-header__branding" title="go to the ...
## [2] <a href="/dashboard">My Account</a>
## [3] <a href="/finder" class="is-active">Project Finder</a>
## [4] <a href="/events">Event Finder</a>
## [5] <a href="/people-finder">People Finder</a>
## [6] <a href="#dialog-login" rel="modal:open">log in</a>
在一个更复杂的例子中,我们可以用它来“爬网”页面,但那是另一天的事了

页面上的每个div:

    scistarter_html %>%
      html_nodes("div") %>%
      head()

## {xml_nodeset (6)}
## [1] <div class="site-header__nav js-hamburger b-utility">\n        <butt ...
## [2] <div class="site-header__nav__body js-hamburger__body">\n          < ...
## [3] <div class="nav-tools">\n            <div class="nav-tools__search"> ...
## [4] <div class="nav-tools__search">\n              <div class="field">\n ...
## [5] <div class="field">\n                <form method="get" action="/fin ...
## [6] <div class="input-group input-group--flush">\n                    <d ...
…nav tools div。这通过css调用,其中class=nav tools

    scistarter_html %>%
      html_nodes("div.nav-tools") %>%
      head()

## {xml_nodeset (1)}
## [1] <div class="nav-tools">\n            <div class="nav-tools__search"> ...
我们可以按如下id调用节点

    scistarter_html %>%
      html_nodes("div#project-listing") %>%
      head()

## {xml_nodeset (1)}
## [1] <div id="project-listing" class="subtabContent">\n          \n       ...
所有表格如下:

    scistarter_html %>%
      html_nodes("table") %>%
      head()

## {xml_nodeset (6)}
## [1] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [2] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [3] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [4] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [5] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [6] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
有关更多信息,请参阅下面的相关链接


就像html_table html_nodeswebpage.2、.table-type-赔率-1?@MartinSchmelzer正是我想要的,vielen Dank:
    scistarter_html %>%
      html_nodes("table") %>%
      head()

## {xml_nodeset (6)}
## [1] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [2] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [3] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [4] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [5] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...
## [6] <table class="table-project-2-col u-mb-0">\n<legend class="u-visuall ...