通过谷歌playstore在R中抓取网页_R_Web Scraping_Rvest_Data Extraction

通过谷歌playstore在R中抓取网页

r web-scraping

通过谷歌playstore在R中抓取网页,r,web-scraping,rvest,data-extraction,R,Web Scraping,Rvest,Data Extraction,我想从google play store中抓取我想要的几个应用程序评论的数据名称字段他们有多少明星他们写的评论后来我试图发现更多我发现的名称\u数据\u html有 > Name_data_html {xml_nodeset (0)} 我是一个新的网页刮可以帮助我与此您应该使用XPath在网页上选择对象： #Loading the rvest package library('rvest') #Specifying the url for desired website

我想从google play store中抓取我想要的几个应用程序评论的数据

名称字段

他们有多少明星

他们写的评论

后来我试图发现更多我发现的名称\u数据\u html有

> Name_data_html
{xml_nodeset (0)}

我是一个新的网页刮可以帮助我与此

您应该使用XPath在网页上选择对象：

#Loading the rvest package
library('rvest')
#Specifying the url for desired website to be scrapped
url <- 'https://play.google.com/store/apps/details?id=com.phonegap.rxpal&hl=en_IN'
#Reading the HTML code from the website
webpage <- read_html(url)
# Using Xpath
Name_data_html <- webpage %>% html_nodes(xpath='/html/body/div[1]/div[4]/c-wiz/div/div[2]/div/div[1]/div/c-wiz[1]/c-wiz[1]/div/div[2]/div/div[1]/c-wiz[1]/h1/span')
#Converting the Name data to text
Name_data <- html_text(Name_data_html)
#Look at the Name
head(Name_data)

#加载rvest包
图书馆（'rvest'）
#指定要废弃的所需网站的url
url在分析了您的代码和您发布的url的源页面后，我认为您无法放弃任何内容的原因是因为内容是动态生成的，所以rvest无法正确获取
以下是我的解决方案：
#Loading the rvest package
library(rvest)
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of 

#Specifying the url for desired website to be scrapped
url <- 'https://play.google.com/store/apps/details?id=com.phonegap.rxpal&hl=en_IN'

# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()

# go to website
remDr$navigate(url)

# get page source and save it as an html object with rvest
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()

# 1) name field (assuming that with 'name' you refer to the name of the reviewer)
names <- html_obj %>% html_nodes(".kx8XBd .X43Kjb") %>% html_text()

# 2) How much star they got 
stars <- html_obj %>% html_nodes(".kx8XBd .nt2C1d [role='img']") %>% html_attr("aria-label")

# 3) review they wrote
reviews <- html_obj %>% html_nodes(".UD7Dzf") %>% html_text()

# create the df with all the info
review_data <- data.frame(names = names, stars = stars, reviews = reviews, stringsAsFactors = F)

#加载rvest包
图书馆（rvest）
库（magrittr）#用于“>%”管道符号
库（RSelenium）#获取加载的html
#指定要废弃的所需网站的url
url%html\u text（）
#使用所有信息创建df
我试着按照你的脚本做，但我有一个错误：selCommand我想这应该在另一个问题中提问。您的计算机中是否安装了Java？运行Sys.which（“java”）
，如果您没有找到java的路径，您应该先安装它。@Kardu检查这些链接，了解启动RSelenium：或
#Loading the rvest package
library('rvest')
#Specifying the url for desired website to be scrapped
url <- 'https://play.google.com/store/apps/details?id=com.phonegap.rxpal&hl=en_IN'
#Reading the HTML code from the website
webpage <- read_html(url)
# Using Xpath
Name_data_html <- webpage %>% html_nodes(xpath='/html/body/div[1]/div[4]/c-wiz/div/div[2]/div/div[1]/div/c-wiz[1]/c-wiz[1]/div/div[2]/div/div[1]/c-wiz[1]/h1/span')
#Converting the Name data to text
Name_data <- html_text(Name_data_html)
#Look at the Name
head(Name_data)

#Loading the rvest package
library(rvest)
library(magrittr) # for the '%>%' pipe symbols
library(RSelenium) # to get the loaded html of 

#Specifying the url for desired website to be scrapped
url <- 'https://play.google.com/store/apps/details?id=com.phonegap.rxpal&hl=en_IN'

# starting local RSelenium (this is the only way to start RSelenium that is working for me atm)
selCommand <- wdman::selenium(jvmargs = c("-Dwebdriver.chrome.verboseLogging=true"), retcommand = TRUE)
shell(selCommand, wait = FALSE, minimized = TRUE)
remDr <- remoteDriver(port = 4567L, browserName = "chrome")
remDr$open()

# go to website
remDr$navigate(url)

# get page source and save it as an html object with rvest
html_obj <- remDr$getPageSource(header = TRUE)[[1]] %>% read_html()

# 1) name field (assuming that with 'name' you refer to the name of the reviewer)
names <- html_obj %>% html_nodes(".kx8XBd .X43Kjb") %>% html_text()

# 2) How much star they got 
stars <- html_obj %>% html_nodes(".kx8XBd .nt2C1d [role='img']") %>% html_attr("aria-label")

# 3) review they wrote
reviews <- html_obj %>% html_nodes(".UD7Dzf") %>% html_text()

# create the df with all the info
review_data <- data.frame(names = names, stars = stars, reviews = reviews, stringsAsFactors = F)