将字符串转换为Android JSONObject将丢失utf-8

将字符串转换为Android JSONObject将丢失utf-8,android,json,utf-8,jsonobject,Android,Json,Utf 8,Jsonobject,我试图从URL获取一个(JSON格式的)字符串,并将其作为JSON对象使用。当我将字符串转换为JSONObject时,我丢失了UTF-8编码 这是我用来连接url并获取字符串的函数: private static String getUrlContents(String theUrl) { StringBuilder content = new StringBuilder(); try { URL url = new URL(theUrl); UR

我试图从URL获取一个(JSON格式的)字符串,并将其作为JSON对象使用。当我将字符串转换为JSONObject时,我丢失了UTF-8编码

这是我用来连接url并获取字符串的函数:

private static String getUrlContents(String theUrl) {
    StringBuilder content = new StringBuilder();
    try {
        URL url = new URL(theUrl);
        URLConnection urlConnection = url.openConnection();
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));

        String line;
        while ((line = bufferedReader.readLine()) != null) {
            content.append(line + "\n");
        }
        bufferedReader.close();
    } catch(Exception e) {
        e.printStackTrace();
    }

    return content.toString();
}
从服务器获取数据时,以下代码显示正确的字符:

String output = getUrlContents(url);
Log.i("message1", output);
但是当我将输出字符串转换为JSONObject时,波斯语字符变成了问号,就像这样??????。(messages是JSON中数组的名称)


您告诉Java使用ISO-8859-1将字符串(带有键
message
)转换为字节,然后从这些字节创建一个新字符串,解释为UTF-8

new String(reader.getString("messages").getBytes("ISO-8859-1"), "UTF-8");
您可以简单地使用:

String messages = reader.getString("messages");

不确定这是否有帮助,但您可能可以这样做:

JSONObject result = null;
String str = null;
try 
{           
    str = new String(output, "UTF-8");
    result = (JSONObject) new JSONTokener(str).nextValue();
} 
catch (Exception e) {}

String messages = result.getString("messages");

您可以按以下方式更新代码:

    private static String getUrlContents(String theUrl) {
        StringBuilder content = new StringBuilder();
        try {
            URL url = new URL(theUrl);
            URLConnection urlConnection = url.openConnection();
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), "utf-8"));

            String line;
            while ((line = bufferedReader.readLine()) != null) {
                content.append(line).append("\n");
            }
            bufferedReader.close();
        } catch(Exception e) {
            e.printStackTrace();
        }

        return content.toString().trim();
    }

您有两个编码问题:

  • 服务器发送以字符集编码的文本。设置InputStreamReader时,需要将所用的编码传递给服务器,以便正确解码。字符编码通常在
    Content-type
    HTTP响应中的
    charset
    字段中给出。JSON通常是UTF-8编码的,但也可以是合法的UTF-16和UTF-32,因此需要检查。如果没有指定的编码,将字节编组为字符串时将使用系统环境,反之亦然。基本上,您应该始终指定字符集

  • String messages=新字符串(reader.getString(“messages”).getBytes(“ISO-8859-1”),“UTF-8”)
    显然会引起问题(如果您使用非ascii字符)-它将字符串编码为ISO-8995-1,然后尝试将其解码为UTF-8

  • 在读取inputstream之前,可以使用一个简单的正则表达式模式从内容类型头中提取
    charset
    值。我还包括了一个整洁的InputStream->String转换器

    private static String getUrlContents(String theUrl) {
    
        try {
            URL url = new URL(theUrl);
            URLConnection urlConnection = url.openConnection();
            InputStream is = urlConnection.getInputStream();
    
            // Get charset field from Content-Type header
            String contentType = urlConnection.getContentType();
            // matches value in key / value pair
            Pattern encodingPattern = Pattern.compile(".*charset\\s*=\\s*([\\w-]+).*");
            Matcher encodingMatcher = encodingPattern.matcher(contentType);
            // set charsetString to match value if charset is given, else default to UTF-8
            String charsetString = encodingMatcher.matches() ? encodingMatcher.group(1) : "UTF-8";
    
            // Quick way to read from InputStream.
            // \A is a boundary match for beginning of the input
            return new Scanner(is, charsetString).useDelimiter("\\A").next();
        } catch(Exception e) {
            e.printStackTrace();
        }
    
        return null;
    }
    

    这是因为您通过线路接收的字节已在
    getUrlContents
    中正确解释,并在内部存储为UTF-16字符串。
    getUrlContents
    仅当服务器的字符编码与客户端的字符编码匹配时才起作用。
    private static String getUrlContents(String theUrl) {
    
        try {
            URL url = new URL(theUrl);
            URLConnection urlConnection = url.openConnection();
            InputStream is = urlConnection.getInputStream();
    
            // Get charset field from Content-Type header
            String contentType = urlConnection.getContentType();
            // matches value in key / value pair
            Pattern encodingPattern = Pattern.compile(".*charset\\s*=\\s*([\\w-]+).*");
            Matcher encodingMatcher = encodingPattern.matcher(contentType);
            // set charsetString to match value if charset is given, else default to UTF-8
            String charsetString = encodingMatcher.matches() ? encodingMatcher.group(1) : "UTF-8";
    
            // Quick way to read from InputStream.
            // \A is a boundary match for beginning of the input
            return new Scanner(is, charsetString).useDelimiter("\\A").next();
        } catch(Exception e) {
            e.printStackTrace();
        }
    
        return null;
    }