Java中的HTTP URL地址编码_Java_Http_Urlencode

Java中的HTTP URL地址编码

java http

Java中的HTTP URL地址编码,java,http,urlencode,Java,Http,Urlencode,我的Java独立应用程序从用户那里获得一个URL（指向一个文件），我需要点击它并下载它。我面临的问题是，我无法正确编码HTTP URL地址例如： URL: http://search.barnesandnoble.com/booksearch/first book.pdf java.net.URLEncoder.encode(url.toString(), "ISO-8859-1"); 返回给我： http%3A%2F%2Fsearch.barnesandnoble.com%2Fbook

我的Java独立应用程序从用户那里获得一个URL（指向一个文件），我需要点击它并下载它。我面临的问题是，我无法正确编码HTTP URL地址

例如：

URL:  http://search.barnesandnoble.com/booksearch/first book.pdf

java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");

返回给我：

http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf

但是，我想要的是

http://search.barnesandnoble.com/booksearch/first%20book.pdf

（空格替换为%20）

我猜

URLEncoder

不是为编码HTTP URL而设计的。。。JavaDoc说“HTML表单编码的实用程序类”。。。还有其他方法吗？

是的，URL编码将对该字符串进行编码，以便在URL中正确地传递到最终目的地。例如，你不可能有。对参数进行URL编码将修复该参数值

所以我有两个选择：

您是否有权访问与域分离的路径？如果是这样，您可以简单地对路径进行URL编码。但是，如果情况并非如此，则选项2可能适合您

获取commons-httpclient-3.1。这有一个类URIUtil：

System.out.println（URIUtil.encodePath（“y”、“ISO-8859-1”）

这将准确地输出您要查找的内容，因为它只对URI的路径部分进行编码

仅供参考，您需要commons编解码器和commons日志才能在运行时使用此方法。

URLEncoding可以很好地对HTTP URL进行编码，正如您不幸发现的那样。您传入的字符串“book.pdf”已正确、完整地编码为URL编码形式。您可以将返回的整个gobbledigook长字符串作为URL中的参数传递，然后可以将其解码为您传入的字符串

听起来您希望做一些与将整个URL作为参数传递稍有不同的事情。据我所知，您正在尝试创建一个类似“”的搜索URL。您唯一需要编码的是“whateverTheUserPassesIn”位，因此您可能需要做的就是这样：

String url = "http://search.barnesandnoble.com/booksearch/" + 
       URLEncoder.encode(userInput,"UTF-8");

这对你来说应该会产生一些更有效的东西。

课程可以帮助你；在您找到的URL的文档中

注意，在某些情况下，URI类确实执行其组件字段的转义。管理URL编码和解码的推荐方法是使用URI

使用一个具有多个参数的构造函数，如：

URI uri = new URI(
    "http", 
    "search.barnesandnoble.com", 
    "/booksearch/first book.pdf",
    null);
URL url = uri.toURL();
//or String request = uri.toString();

（URI的单参数构造函数不转义非法字符）

上述代码仅转义非法字符-它不会转义非ASCII字符（请参阅fatih的注释）。

toASCIIString

方法可用于获取仅包含US-ASCII字符的字符串：

URI uri = new URI(
    "http", 
    "search.barnesandnoble.com", 
    "/booksearch/é",
    null);
String request = uri.toASCIIString();

对于具有类似

http://www.google.com/ig/api?weather=S圣保罗

，使用构造函数的5参数版本：

URI uri = new URI(
        "http", 
        "www.google.com", 
        "/ig/api",
        "weather=São Paulo",
        null);
String request = uri.toASCIIString();

请注意，以上大多数答案都是错误的

urlcoder

类，尽管是名称，但不是这里需要的。不幸的是，Sun对这个类的命名如此令人烦恼

URLEncoder

用于将数据作为参数传递，而不是编码URL本身

换句话说，

”http://search.barnesandnoble.com/booksearch/first book.pdf“

是URL。参数可以是，例如，

”http://search.barnesandnoble.com/booksearch/first book.pdf？parameter1=this¶m2=that“

。这些参数是您将使用的

urlcoder

以下两个示例突出了两者之间的差异

根据HTTP标准，下面的代码生成了错误的参数。注：符号（&）和加号（+）的编码不正确

uri = new URI("http", null, "www.google.com", 80, 
"/help/me/book name+me/", "MY CRZY QUERY! +&+ :)", null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY%20CRZY%20QUERY!%20+&+%20:)

下面将生成正确的参数，并对查询进行正确编码。注意空格、符号和加号

uri = new URI("http", null, "www.google.com", 80, "/help/me/book name+me/", URLEncoder.encode("MY CRZY QUERY! +&+ :)", "UTF-8"), null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY+CRZY+QUERY%2521+%252B%2526%252B+%253A%2529

如果您的URL中有编码的“/”（%2F），则仍然存在问题

RFC 3986-第2.2节说：“如果URI组件的数据与保留字符作为分隔符的用途相冲突，则在形成URI之前，必须对冲突数据进行百分比编码。”（RFC 3986-第2.2节）

但是Tomcat有一个问题：

-在Apache Tomcat 6.0.10中修复

重要提示：目录遍历CVE-2007-0450

Tomcat允许“\”、“%2F”和“%5C” [...] .

以下是Java系统属性已添加到Tomcat以提供对货物处理的附加控制 URL中的路径分隔符（两个选项默认为false）：

org.apache.tomcat.util.buf.udeconder.ALLOW_ENCODED_斜杠：对|错
org.apache.catalina.connector.CoyoteAdapter.ALLOW_反斜杠：对|错

由于无法保证所有URL都由Tomcat作为它们位于代理服务器Tomcat中应该始终像没有一样加以保护限制上下文访问的代理被删除用过

影响：6.0.0-6.0.9

因此，如果您有一个带有%2F字符的URL，Tomcat将返回：“400无效URI:NoFlash”

您可以在Tomcat启动脚本中切换错误修复：

set JAVA_OPTS=%JAVA_OPTS% %LOGGING_CONFIG%   -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true

我开发的解决方案比任何其他解决方案都更稳定：

public class URLParamEncoder {

    public static String encode(String input) {
        StringBuilder resultStr = new StringBuilder();
        for (char ch : input.toCharArray()) {
            if (isUnsafe(ch)) {
                resultStr.append('%');
                resultStr.append(toHex(ch / 16));
                resultStr.append(toHex(ch % 16));
            } else {
                resultStr.append(ch);
            }
        }
        return resultStr.toString();
    }

    private static char toHex(int ch) {
        return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
    }

    private static boolean isUnsafe(char ch) {
        if (ch > 128 || ch < 0)
            return true;
        return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
    }

}

公共类urlpamencoder{
公共静态字符串编码（字符串输入）{
StringBuilder resultStr=新建StringBuilder（）；
for（char ch:input.toCharArray（））{
如果（不安全（ch））{
结果tr.append（'%'）；
结果tr.append（toHex（ch/16））；
结果tr.append（toHex（ch%16））；
}否则{
结果追加（ch）；
}
}
返回resultStr.toString（）；
}
私有静态字符到十六进制（int-ch）{
返回（字符）（ch<10？'0'+ch:A'+ch-10）；
}
私有静态布尔值不安全（char ch）{
如果（ch>128 | | ch<0）
返回true；
返回“%$&+，/：；=？@#%”。indexOf（ch）>=0；
}
}

我创建了一个新项目来帮助构建HTTP URL。黎族

new UrlBuilder("search.barnesandnoble.com", "booksearch/first book.pdf").toString()

public static String encodeURLComponent(final String s)
{
  if (s == null)
  {
    return "";
  }

  final StringBuilder sb = new StringBuilder();

  try
  {
    for (int i = 0; i < s.length(); i++)
    {
      final char c = s.charAt(i);

      if (((c >= 'A') && (c <= 'Z')) || ((c >= 'a') && (c <= 'z')) ||
          ((c >= '0') && (c <= '9')) ||
          (c == '-') ||  (c == '.')  || (c == '_') || (c == '~'))
      {
        sb.append(c);
      }
      else
      {
        final byte[] bytes = ("" + c).getBytes("UTF-8");

        for (byte b : bytes)
        {
          sb.append('%');

          int upper = (((int) b) >> 4) & 0xf;
          sb.append(Integer.toHexString(upper).toUpperCase(Locale.US));

          int lower = ((int) b) & 0xf;
          sb.append(Integer.toHexString(lower).toUpperCase(Locale.US));
        }
      }
    }

    return sb.toString();
  }
  catch (UnsupportedEncodingException uee)
  {
    throw new RuntimeException("UTF-8 unsupported!?", uee);
  }
}

URI uri = new URI(
        "http",
        null, // this is for userInfo
        "www.google.com",
        8080, // port number as int
        "/ig/api",
        "weather=São Paulo",
        null);
String request = uri.toASCIIString();

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

public URL convertToURLEscapingIllegalCharacters(String string){
    try {
        String decodedURL = URLDecoder.decode(string, "UTF-8");
        URL url = new URL(decodedURL);
        URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef()); 
        return uri.toURL(); 
    } catch (Exception ex) {
        ex.printStackTrace();
        return null;
    }
}

String retVal = "";

try {
    retVal = URLEncoder.encode(in_, "UTF8");
} catch (UnsupportedEncodingException ex) {
    Log.get().exception(Log.Level.Error, "urlEncode ", ex);
}

return retVal;

UriUtils.encodeUri(input, "UTF-8")

/**
     * Encode URL (except :, /, ?, &, =, ... characters)
     * @param url to encode
     * @param encodingCharset url encoding charset
     * @return encoded URL
     * @throws UnsupportedEncodingException
     */
    public static String encodeUrl (String url, String encodingCharset) throws UnsupportedEncodingException{
            return new URLCodec().encode(url, encodingCharset).replace("%3A", ":").replace("%2F", "/").replace("%3F", "?").replace("%3D", "=").replace("%26", "&");
    }

String urlToEncode = ""http://www.growup.com/folder/intérieur-à_vendre?o=4";
Utils.encodeUrl (urlToEncode , "UTF-8")

public static URL convertToURLEscapingIllegalCharacters(String toEscape) throws MalformedURLException, URISyntaxException {
            URL url = new URL(toEscape);
            URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
            //if a % is included in the toEscape string, it will be re-encoded to %25 and we don't want re-encoding, just encoding
            return new URL(uri.toString().replace("%25", "%"));
}

android.net.Uri.encode(urlString, ":/");

/**
 * Percent-encodes a string so it's suitable for use in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentEncode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String encoded = encodeMe.replace("%", "%25");
    encoded = encoded.replace(" ", "%20");
    encoded = encoded.replace("!", "%21");
    encoded = encoded.replace("#", "%23");
    encoded = encoded.replace("$", "%24");
    encoded = encoded.replace("&", "%26");
    encoded = encoded.replace("'", "%27");
    encoded = encoded.replace("(", "%28");
    encoded = encoded.replace(")", "%29");
    encoded = encoded.replace("*", "%2A");
    encoded = encoded.replace("+", "%2B");
    encoded = encoded.replace(",", "%2C");
    encoded = encoded.replace("/", "%2F");
    encoded = encoded.replace(":", "%3A");
    encoded = encoded.replace(";", "%3B");
    encoded = encoded.replace("=", "%3D");
    encoded = encoded.replace("?", "%3F");
    encoded = encoded.replace("@", "%40");
    encoded = encoded.replace("[", "%5B");
    encoded = encoded.replace("]", "%5D");
    return encoded;
}

/**
 * Percent-decodes a string, such as used in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentDecode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String decoded = encodeMe.replace("%21", "!");
    decoded = decoded.replace("%20", " ");
    decoded = decoded.replace("%23", "#");
    decoded = decoded.replace("%24", "$");
    decoded = decoded.replace("%26", "&");
    decoded = decoded.replace("%27", "'");
    decoded = decoded.replace("%28", "(");
    decoded = decoded.replace("%29", ")");
    decoded = decoded.replace("%2A", "*");
    decoded = decoded.replace("%2B", "+");
    decoded = decoded.replace("%2C", ",");
    decoded = decoded.replace("%2F", "/");
    decoded = decoded.replace("%3A", ":");
    decoded = decoded.replace("%3B", ";");
    decoded = decoded.replace("%3D", "=");
    decoded = decoded.replace("%3F", "?");
    decoded = decoded.replace("%40", "@");
    decoded = decoded.replace("%5B", "[");
    decoded = decoded.replace("%5D", "]");
    decoded = decoded.replace("%25", "%");
    return decoded;
}

@Test
public void testPercentEncode_Decode() {
    assertEquals("", percentDecode(percentEncode(null)));
    assertEquals("", percentDecode(percentEncode("")));

    assertEquals("!", percentDecode(percentEncode("!")));
    assertEquals("#", percentDecode(percentEncode("#")));
    assertEquals("$", percentDecode(percentEncode("$")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("&", percentDecode(percentEncode("&")));
    assertEquals("'", percentDecode(percentEncode("'")));
    assertEquals("(", percentDecode(percentEncode("(")));
    assertEquals(")", percentDecode(percentEncode(")")));
    assertEquals("*", percentDecode(percentEncode("*")));
    assertEquals("+", percentDecode(percentEncode("+")));
    assertEquals(",", percentDecode(percentEncode(",")));
    assertEquals("/", percentDecode(percentEncode("/")));
    assertEquals(":", percentDecode(percentEncode(":")));
    assertEquals(";", percentDecode(percentEncode(";")));

    assertEquals("=", percentDecode(percentEncode("=")));
    assertEquals("?", percentDecode(percentEncode("?")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("[", percentDecode(percentEncode("[")));
    assertEquals("]", percentDecode(percentEncode("]")));
    assertEquals(" ", percentDecode(percentEncode(" ")));

    // Get a little complex
    assertEquals("[]]", percentDecode(percentEncode("[]]")));
    assertEquals("a=d%*", percentDecode(percentEncode("a=d%*")));
    assertEquals(")  (", percentDecode(percentEncode(")  (")));
    assertEquals("%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25",
                    percentEncode("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %"));
    assertEquals("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %", percentDecode(
                    "%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25"));

    assertEquals("%23456", percentDecode(percentEncode("%23456")));

}

/***
 * Replaces any character not specifically unreserved to an equivalent 
 * percent sequence.
 * @param s
 * @return
 */
public static String encodeURIcomponent(String s)
{
    StringBuilder o = new StringBuilder();
    for (char ch : s.toCharArray()) {
        if (isSafe(ch)) {
            o.append(ch);
        }
        else {
            o.append('%');
            o.append(toHex(ch / 16));
            o.append(toHex(ch % 16));
        }
    }
    return o.toString();
}

private static char toHex(int ch)
{
    return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}

// https://tools.ietf.org/html/rfc3986#section-2.3
public static final HashSet<Character> UnreservedChars = new HashSet<Character>(Arrays.asList(
        'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
        'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
        '0','1','2','3','4','5','6','7','8','9',
        '-','_','.','~'));
public static boolean isSafe(char ch)
{
    return UnreservedChars.contains(ch);
}

org.apache.commons.text.StringEscapeUtils.escapeHtml4("my text % & < >");

 <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-text</artifactId>
        <version>1.8</version>
    </dependency>