Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/78.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从没有唯一URL的网页中删除R中的Javascript呈现内容_Javascript_R_Web Scraping - Fatal编程技术网

从没有唯一URL的网页中删除R中的Javascript呈现内容

从没有唯一URL的网页中删除R中的Javascript呈现内容,javascript,r,web-scraping,Javascript,R,Web Scraping,我想从网站上获取南非彩票抽奖的历史结果(特别是总池大小、总销售额等)。默认情况下,可以看到最近十次绘图的结果链接,也可以选择一个日期范围来拉取一组更大的绘图链接(每页仍然只显示十次) 将鼠标悬停在浏览器中的某个链接上,例如“LOTTO DRAW 2012”,我们看到javascript:void()因此很明显,绘制结果将使用Javascript呈现。在阅读了一篇文章的建议后,我意识到我需要打开Google Chrome开发者工具,然后打开网络标签,然后点击抽奖“LOTTO draw 2012”的

我想从网站上获取南非彩票抽奖的历史结果(特别是总池大小、总销售额等)。默认情况下,可以看到最近十次绘图的结果链接,也可以选择一个日期范围来拉取一组更大的绘图链接(每页仍然只显示十次)

将鼠标悬停在浏览器中的某个链接上,例如“LOTTO DRAW 2012”,我们看到
javascript:void()因此很明显,绘制结果将使用Javascript呈现。在阅读了一篇文章的建议后,我意识到我需要打开Google Chrome开发者工具,然后打开网络标签,然后点击抽奖“LOTTO draw 2012”的链接。当我这么做的时候,我可以看到这是一个被称为

当我右键单击启动器并选择“复制响应”时,我可以在一个“drawDetails”对象中看到所需的数据,该对象看起来是JSON代码

{"code":200,"message":"OK","data":{"drawDetails":{"drawNumber":"2012","drawDate":"2020\/04\/11","nextDrawDate":"2020\/04\/15","ball1":"48","ball2":"6","ball3":"43","ball4":"41","ball5":"25","ball6":"45","bonusBall":"38","div1Winners":"1","div1Payout":"10546013.8","div2Winners":"0","div2Payout":"0","div3Winners":"28","div3Payout":"7676.4","div4Winners":"62","div4Payout":"2751.4","div5Winners":"1389","div5Payout":"206.3","div6Winners":"1872","div6Payout":"133","div7Winners":"28003","div7Payout":"50","div8Winners":"20651","div8Payout":"20","rolloverAmount":"0","rolloverNumber":"0","totalPrizePool":"13280236.5","totalSales":"11610950","estimatedJackpot":"2000000","guaranteedJackpot":"0","drawMachine":"RNG2","ballSet":"RNG","status":"published","winners":52006,"millionairs":1,"gpwinners":"52006","wcwinners":"0","ncwinners":"0","ecwinners":"0","mpwinners":"0","lpwinners":"0","fswinners":"0","kznwinners":"0","nwwinners":"0"},"totalWinnerRecord":{"lottoMillionairs":28716702,"lottoWinners":337285646,"ithubaMillionairs":135763,"ithubaWinners":305615802}},"videoData":[{"id":"1049","listid":"1","parentid":"1","videosource":"youtube","videoid":"chHfFxVi9QI","imageurl":"","title":"LOTTO, LOTTO PLUS 1 AND LOTTO PLUS 2 DRAW 2012 (11 APRIL 2020)","description":"","custom_imageurl":"","custom_title":"","custom_description":"","specialparams":"","lastupdate":"0000-00-00 00:00:00","allowupdates":"1","status":"0","isvideo":"1","link":"https:\/\/www.youtube.com\/watch?v=chHfFxVi9QI","ordering":"10001","publisheddate":"2020-04-11 20:06:17","duration":"182","rating_average":"0","rating_max":"0","rating_min":"0","rating_numRaters":"0","statistics_favoriteCount":"0","statistics_viewCount":"329","keywords":"","startsecond":"0","endsecond":"0","likes":"6","dislikes":"0","commentcount":"0","channel_username":"","channel_title":"","channel_subscribers":"9880","channel_subscribed":"0","channel_location":"","channel_commentcount":"0","channel_viewcount":"0","channel_videocount":"1061","channel_description":"","channel_totaluploadviews":"0","alias":"lotto-lotto-plus-1-and-lotto-plus-2-draw-2012-11-april-2020","rawdata":"","datalink":"https:\/\/www.googleapis.com\/youtube\/v3\/videos?id=chHfFxVi9QI&part=id,snippet,contentDetails,statistics&key=AIzaSyC1Xvk2GUdb_N3UiFtjsgZ-uMviJ_8MFZI"}]}
这是一个POST类型的请求,因此我尝试跟随,但找不到表示随表单提交的数据的
onclick
值。此外,“LOTTO DRAW 2012”的请求URL与“LOTTO DRAW 2011”的请求URL相同,因此URL本身传递的特定抽奖没有唯一标识符。因此,我不清楚如何对特定抽签结果提出独特要求

因此,一个较小的问题是,给定一个特定的彩票抽奖号码或抽奖日期,如何找到用于对该抽奖相关数据进行POST请求的唯一标识符


更大的问题是,如果能够获得所有历史绘图的唯一标识符,如何依次为所有历史绘图生成JSON drawDetails对象,或者以其他方式完成刮削操作?

你说得对-页面上的内容通过ajax请求由javascript更新。服务器返回一个json字符串以响应http POST请求。对于POST请求,服务器的响应不仅取决于您请求的url,还取决于您发送给服务器的消息体。在本例中,您的主体是一个简单的表单,包含3个字段:
gameName
,它总是
LOTTO
isAjax
,它总是
true
,以及
drawNumber
,这是您想要更改的字段

如果使用的是
httr
,则可以在
POST
函数的
body
参数中将这些字段指定为命名列表

在获得每个绘图的响应后,您将希望使用库(如
jsonlite
)将json解析为R友好格式,如列表或数据帧。从这个特定json的结构来看,提取组件
$data$drawDetails
并使其成为一行数据帧最有意义。这将允许您将多个绘图绑定到单个数据帧中

这里有一个函数可以为您完成所有这些:

乐透详情1 2009 2020/04/01 2020/04/04 51 15 7 32 42 45
#> 2       2010 2020/04/04   2020/04/08    43     4    21    24    10     3
#> 3       2011 2020/04/08   2020/04/11    42    43     8    18     2    29
#> 4       2012 2020/04/11   2020/04/15    48     6    43    41    25    45
#>Bonuspall Div1赢家Div1付款Div2赢家Div2付款Div3赢家
#> 1         1           0          0           0          0          21
#> 2        22           0          0           0          0          31
#> 3        34           0          0           0          0          21
#> 4        38           1 10546013.8           0          0          28
#>分区3付款分区4赢家分区4付款分区5赢家分区5付款分区6赢家
#> 1     8455.3          60     2348.7        1252        189        1786
#> 2     6004.3          71     2080.6        1808      137.3        2352
#> 3     8584.5          60     2384.6        1405      171.1        2079
#> 4     7676.4          62     2751.4        1389      206.3        1872
#>Div6付款Div7赢家Div7付款Div8赢家Div8付款滚动平均金额
#> 1      115.2       24664         50       19711         20     3809758.17
#> 2       91.7       35790         50       25981         20     5966533.86
#> 3      100.5       27674         50       21895         20     8055430.87
#> 4        133       28003         50       20651         20              0
#>rolloverNumber totalPrizePool totalSales estimatedJackpot
#> 1              2     6198036.67    9879655          6000000
#> 2              3     9073426.56   11696905          8000000
#> 3              4    10649716.37   10406895         10000000
#> 4              0     13280236.5   11610950          2000000
#>保证头奖抽金机球组状态赢家百万富翁
#>1 0 RNG2 RNG已发布47494 0
#>2 0 RNG2 RNG已发布66033 0
#>3 0 RNG2 RNG已发布53134 0
#>4 0 RNG2 RNG已发布52006 1
#>gpwinners wcwinners ncwinners ecwinners mpwinners lpwinners fswinners
#> 1     47494         0         0         0         0         0         0
#> 2     66033         0         0         0         0         0         0
#> 3     53134         0         0         0         0         0         0
#> 4     52006         0         0         0         0         0         0
#>克兹尼获奖者
#> 1          0         0
#> 2          0         0
#> 3          0         0
#> 4          0         0

由(v0.3.0)于2020年4月13日创建的问题已经有了一个令人满意的答案(见上文),我已经接受了。我同时得出了一个几乎相同的解决方案;我在这里添加它只是因为它明确地涵盖了所有可用的绘图编号,并将自动检测最新的绘图编号
theurl <- "https://www.nationallottery.co.za/index.php?task=results.redirectPageURL&amp;Itemid=265&amp;option=com_weaver&amp;controller=lotto-history"
x <- rvest::html_text(xml2::read_html(theurl))
preceding_string <- "LOTTO, LOTTO PLUS 1 AND LOTTO PLUS 2 DRAW "
drawnums <- as.integer(vapply(gregexpr(preceding_string, x)[[1]] + nchar(preceding_string), 
              function(k) substr(x, start = k, stop = k + 3), NA_character_))
drawnumrange <- 1506:max(drawnums)
response <- lapply(drawnumrange, function(d) httr::POST(url = theurl, 
                body = list(gameName = "LOTTO", drawNumber = as.character(d), isAjax = 
                "true"), encode = "form"))
jsondat <- lapply(response, function(r) jsonlite::parse_json(r)$data$drawDetails)
lottotable <- as.data.frame(do.call(rbind, jsondat))
numericcols <- c(1, 4:32, 36:37)
lottotable[numericcols] <- sapply(lottotable[numericcols], as.numeric)
xlsx::write.xlsx2(lottotable[1:37], "lottotable.xlsx", row.names = FALSE)