URLencode在R中的问题_R_Urlencode_Gsub

URLencode在R中的问题

URLencode在R中的问题,r,urlencode,gsub,R,Urlencode,Gsub,为了能够从R访问NIST化学Webbook数据库，我需要能够向URL编码的网址传递一些查询。大多数情况下，这种转换在URLencode（）上都能正常工作，但在某些情况下不能。失败的一种情况，例如： query="Poligodial + 3-methoxy-4,5-methylenedioxyamphetamine (R,S) adduct, # 1" 我试着用它来取 library(XML) library(RCurl) url=URLencode(paste0('http://webboo

为了能够从R访问NIST化学Webbook数据库，我需要能够向URL编码的网址传递一些查询。大多数情况下，这种转换在URLencode（）上都能正常工作，但在某些情况下不能。失败的一种情况，例如：

query="Poligodial + 3-methoxy-4,5-methylenedioxyamphetamine (R,S) adduct, # 1"

我试着用它来取

library(XML)
library(RCurl)
url=URLencode(paste0('http://webbook.nist.gov/cgi/cbook.cgi?Name=',query,'&Units=SI'))
doc=htmlParse(getURL(url),encoding="UTF-8")

但是，如果您在web浏览器中尝试此url 它给出了未找到的名称。显然，如果您尝试从它需要URL编码的字符串

"http://webbook.nist.gov/cgi/cbook.cgi?Name=Poligodial+%2B+3-methoxy-4%2C5-methylenedioxyamphetamine+%28R%2CS%29+adduct%2C+%23+1&Units=SI"

有人知道在这种情况下，我应该使用什么样的

gsub

规则来实现相同的URL编码吗？还是有其他简单的解决方法

我试过了

url=gsub(" ","+",gsub(",","%2C",gsub("+","%2B",URLencode(paste('http://webbook.nist.gov/cgi/cbook.cgi?Name=',query,'&Units=SI', sep="")),fixed=T),fixed=T),fixed=T)

但这仍然不太正确，我不知道网站所有者可以使用什么规则

这是你想要的吗

library(httr)
url <- 'http://webbook.nist.gov/cgi/cbook.cgi'
args <- list(Name = "Poligodial + 3-methoxy-4,5-methylenedioxyamphetamine (R,S) adduct, # 1",
         Units = 'SI')
res <- GET(url, query=args)
content(res)$children$html

库（httr）
urlURLencode
遵循（见第3页第2.2节），其中规定：
仅字母数字、特殊字符“$-551;+！*”（）、”和
可以使用用于保留目的的保留字符
在URL中未编码
也就是说，它不编码加号、逗号或括号。因此，它生成的URL在理论上是正确的，但在实践中不是
Scott提到的httr
包中的GET
函数从RCurl
调用curleescape
，对这些标点符号进行编码
（GET
callshandle\u url
哪个调用modify\u url
哪个调用build\u url
哪个调用curleescape
）
它生成的URL是
paste0('http://webbook.nist.gov/cgi/cbook.cgi?Name=', curlEscape(query), '&Units=SI')
## [1] "http://webbook.nist.gov/cgi/cbook.cgi?Name=Poligodial%20%2B%203%2Dmethoxy%2D4%2C5%2Dmethylenedioxyamphetamine%20%28R%2CS%29%20adduct%2C%20%23%201&Units=SI"

这个
httr
具有很好的功能，您可能想开始使用它。要使代码正常工作，对代码的最小更改就是将URLencode
替换为curlEscape
@Richie Cotton的解决方案也能解决
，而URLencode（）
则不能
这里有一个非常简单的例子
# Useless...
URLencode("hi$there")
[1] "hi$there"

# This is good, but only if special characters are escaped first
URLencode("hi\\$there")
[1] "hi%5C$there"

# This works without escaping!
library(httr)
curlEscape("hi$there")
[1] "hi%24there"

哈，那好多了，是的-非常感谢-做了它应该做的事情！百万泰铢！哈哈，非常感谢-对我来说，这是一个更简单的解决方案，尤其是因为我更喜欢getURL（）而不是GET（）（对我来说，在一些脆弱的互联网连接上，它似乎更健壮一些）！百万泰铢，这是一个很好的解释！一个纯粹出于兴趣/好奇的问题，但它确实使curlEscape（）
严格优于URLencode（）？在我看来，URLencode有时是有效的；我一直在工作。。
# Useless...
URLencode("hi$there")
[1] "hi$there"

# This is good, but only if special characters are escaped first
URLencode("hi\\$there")
[1] "hi%5C$there"

# This works without escaping!
library(httr)
curlEscape("hi$there")
[1] "hi%24there"