使用R包装rvest从transfermarkt刮取
我正在学习刮取数据,并使用transfermakt进行刮取,但今天我面临两个问题。我用过选择器。我的代码是:使用R包装rvest从transfermarkt刮取,r,regex,web-scraping,gsub,rvest,R,Regex,Web Scraping,Gsub,Rvest,我正在学习刮取数据,并使用transfermakt进行刮取,但今天我面临两个问题。我用过选择器。我的代码是: library(rvest) url <- "https://www.transfermarkt.es/fc-granada/startseite/verein/16795" webpage <- read_html(url) players_html <- html_nodes(webpage,"#yw1 .tooltipstered") players &
library(rvest)
url <- "https://www.transfermarkt.es/fc-granada/startseite/verein/16795"
webpage <- read_html(url)
players_html <- html_nodes(webpage,"#yw1 .tooltipstered")
players <- html_text(players_html)
players
valores_html <- html_nodes(webpage,'.rechts.hauptlink')
valores <- html_text(valores_html)
valores
valores <- gsub(" miles €","000", valores)
valores <- gsub(" mill. €","0000", valores)
valores
valores <- gsub(",","",valores)
valores <- gsub(" ","", valores)
valores
库(rvest)
url valores_html valores valores
[1] “700英里”“300英里”“800英里”“500英里”
“300英里€”
[6] “300英里1000万欧元。”“300英里1200万欧元”
“500英里”
[11] “1.7亿欧元”“1.5亿欧元”“1.00亿欧元”“800英里”
“800英里”
[16] “300英里”“200英里”“800英里”“700英里”
“400英里”
[21]“700英里”1000万欧元。“800英里”
>瓦洛丽丝瓦洛丽丝瓦洛丽丝
[1] "700000 " "300000 " "800000 " "500000 " "300000 "
"300000 " "1,000000 "
[8] "300000 " "1,200000 " "500000 " "1,700000 " "1,500000 "
"1,000000 " "800000 "
[15] "800000 " "300000 " "2,000000 " "800000 " "700000 "
"400000 " "700000 "
[22] "1,000000 " "800000 "
>瓦洛丽丝瓦洛丽丝瓦洛丽丝
[1] "700000 " "300000 " "800000 " "500000 " "300000 "
"300000 " "1000000 " "300000 "
[9] "1200000 " "500000 " "1700000 " "1500000 " "1000000 "
"800000 " "800000 " "300000 "
[17] "2000000 " "800000 " "700000 " "400000 " "700000 "
"1000000 " "800000 "
基本上,用于删除最终空格的最后一个gsub在本例中没有任何作用。有人能帮我解决这两个问题吗
PS:我用的是西班牙语transfermarkt。至于
gsub
,我们可以用
valores <- html_text(valores_html)
valores <- gsub(" miles €", "000", valores)
valores <- gsub(" mill. €", "0000", valores)
valores <- gsub("\\D", "", valores)
valores
# [1] "700000" "300000" "800000" "500000" "300000" "300000" "1000000" "300000" "1200000"
# [10] "500000" "1700000" "1500000" "1000000" "800000" "800000" "300000" "2000000" "800000"
# [19] "700000" "400000" "700000" "1000000" "800000"
这样我们也只能得到一组(全名)。例如,使用
“yw1 a.spielprofil\u工具提示”
也会返回其简短版本。谢谢。为什么选择器小工具不向我显示正确的CSS选择器?我想我必须在web的源代码中查找,但效率不高。“那你是怎么找到正确的选择器的呢?”米古兰吉塔,我也不是这方面的专家。我刚刚尝试了选择器小工具,但也找不到如何从中获得一个好的css。为了得到答案,我使用SafariWebInspector在源代码中指出了正确的位置,然后自己查看了一下。很抱歉有了更好的答案,但一个好的工具似乎确实有帮助!
> valores_html <- html_nodes(webpage,'.rechts.hauptlink')
> valores <- html_text(valores_html)
> valores
[1] "700 miles € " "300 miles € " "800 miles € " "500 miles € "
"300 miles € "
[6] "300 miles € " "1,00 mill. € " "300 miles € " "1,20 mill. €
" "500 miles € "
[11] "1,70 mill. € " "1,50 mill. € " "1,00 mill. € " "800 miles €
" "800 miles € "
[16] "300 miles € " "2,00 mill. € " "800 miles € " "700 miles €
" "400 miles € "
[21] "700 miles € " "1,00 mill. € " "800 miles € "
> valores <- gsub(" miles €","000", valores)
> valores <- gsub(" mill. €","0000", valores)
> valores
[1] "700000 " "300000 " "800000 " "500000 " "300000 "
"300000 " "1,000000 "
[8] "300000 " "1,200000 " "500000 " "1,700000 " "1,500000 "
"1,000000 " "800000 "
[15] "800000 " "300000 " "2,000000 " "800000 " "700000 "
"400000 " "700000 "
[22] "1,000000 " "800000 "
> valores <- gsub(",","",valores)
> valores <- gsub(" ","", valores)
> valores
[1] "700000 " "300000 " "800000 " "500000 " "300000 "
"300000 " "1000000 " "300000 "
[9] "1200000 " "500000 " "1700000 " "1500000 " "1000000 "
"800000 " "800000 " "300000 "
[17] "2000000 " "800000 " "700000 " "400000 " "700000 "
"1000000 " "800000 "
valores <- html_text(valores_html)
valores <- gsub(" miles €", "000", valores)
valores <- gsub(" mill. €", "0000", valores)
valores <- gsub("\\D", "", valores)
valores
# [1] "700000" "300000" "800000" "500000" "300000" "300000" "1000000" "300000" "1200000"
# [10] "500000" "1700000" "1500000" "1000000" "800000" "800000" "300000" "2000000" "800000"
# [19] "700000" "400000" "700000" "1000000" "800000"
players_html <- html_nodes(webpage,"#yw1 span.hide-for-small a.spielprofil_tooltip")
players <- html_text(players_html)
players
# [1] "Rui Silva" "Aarón Escandell" "Bernardo Cruz"
# [4] "José Antonio Martínez" "Germán Sánchez" "Pablo Vázquez"
# [7] "Álex Martínez" "Adrián Castellano" "Víctor Díaz"
# [10] "Quini" "Nicolás Aguirre" "Fede San Emeterio"
# [13] "Ángel Montoro" "Fran Rico" "Alberto Martín"
# [16] "José Antonio González" "Alejandro Pozo" "Antonio Puertas"
# [19] "Fede Vico" "Daniel Ojeda" "Álvaro Vadillo"
# [22] "Adrián Ramos" "Rodri"