Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/372.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/248.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 转换;php unicode“;刻画_Java_Php - Fatal编程技术网

Java 转换;php unicode“;刻画

Java 转换;php unicode“;刻画,java,php,Java,Php,如何通过Java将所谓的“php unicode”()转换为普通字符?示例\xEF\xBC\xA1->A.jdk中是否有任何嵌入式方法,或者我是否应该使用regex进行此转换?首先需要将字符串中的字节提取到字节数组中,而不进行更改,然后将字节数组解码为UTF-8字符串 将字符串放入字节数组的最简单方法是使用ISO-8859-1对其进行编码,ISO-8859-1将unicode值小于256的每个字符映射到具有相同值(或等效负值)的字节 编辑 上面的代码将UTF-8转换为Unicode字符。如果你想

如何通过Java将所谓的“php unicode”()转换为普通字符?示例\xEF\xBC\xA1->A.jdk中是否有任何嵌入式方法,或者我是否应该使用regex进行此转换?

首先需要将字符串中的字节提取到字节数组中,而不进行更改,然后将字节数组解码为UTF-8字符串

将字符串放入字节数组的最简单方法是使用ISO-8859-1对其进行编码,ISO-8859-1将unicode值小于256的每个字符映射到具有相同值(或等效负值)的字节

编辑
上面的代码将UTF-8转换为Unicode字符。如果你想把它转换成一个合理的ASCII等价物,没有标准的方法:但是

编辑
我假设您有一个包含与UTF-8序列具有相同序号值的字符的字符串,但您指出您的字符串实际上包含转义序列,如中所示:

String phpUnicode = "\\xEF\\xBC\\xA1";
JDK没有任何内置的方法来转换这样的字符串,因此您需要使用自己的正则表达式。由于我们最终希望将utf-8字节序列转换为字符串,因此我们需要设置一个字节数组,可能需要使用:

Pattern oneChar = Pattern.compile("\\\\x([0-9A-F]{2})|(.)", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher matcher = oneChar.matcher(phpUnicode);
ByteArrayOutputStream bytes = new ByteArrayOutputStream();

while (matcher.find()) {
    int ch;
    if (matcher.group(1) == null) {
        ch = matcher.group(2).charAt(0);
    }
    else {
        ch = Integer.parseInt(matcher.group(1), 16);
    }
    bytes.write((int) ch);
}
String javaString = new String(bytes.toByteArray(), "UTF-8");
System.out.println(javaString);
这将通过转换\xAB序列生成UTF-8流。然后将这个UTF-8流转换为Java字符串。需要注意的是,任何不属于转义序列的字符都将被转换为相当于unicode字符低位8位的字节。这对于ascii很好,但可能会导致非ascii字符的转码问题

@麦克道尔:
顺序如下:

String phpUnicode = "\u00EF\u00BC\u00A1"
byte[] bytes = phpUnicode.getBytes("ISO-8859-1"); 
创建一个字节数组,其中包含的字节数与原始字符串的字符数相同,对于unicode值低于256的每个字符,字节数组中存储的数值相同

字符全宽拉丁大写字母A(U+FF41)不在原始字符串中,因此它不在ISO-8859-1中这一事实无关紧要


我知道将字符转换为字节时可能会出现转码错误,这就是为什么我说ISO-8859-1只会“将unicode值小于256的每个字符映射为具有相同值的字节”

所讨论的字符是U+FF21(全宽拉丁大写字母a)。PHP表单(\xEF\xBC\xA1)是一个UTF-8编码的八位字节序列

为了将此序列解码为Java字符串(始终为UTF-16),您将使用以下代码:

// \xEF\xBC\xA1
byte[] utf8 = { (byte) 0xEF, (byte) 0xBC, (byte) 0xA1 };
String utf16 = new String(utf8, Charset.forName("UTF-8"));

// print the char as hex   
for(char ch : utf16.toCharArray()) {
    System.out.format("%02x%n", (int) ch);
}
如果要从字符串文字解码数据,可以使用以下形式的代码:

public static void main(String[] args) {
  String utf16 = transformString("This is \\xEF\\xBC\\xA1 string");
  for (char ch : utf16.toCharArray()) {
    System.out.format("%s %02x%n", ch, (int) ch);
  }
}

private static final Pattern SEQ 
                           = Pattern.compile("(\\\\x\\p{Alnum}\\p{Alnum})+");

private static String transformString(String encoded) {
  StringBuilder decoded = new StringBuilder();
  Matcher matcher = SEQ.matcher(encoded);
  int last = 0;
  while (matcher.find()) {
    decoded.append(encoded.substring(last, matcher.start()));
    byte[] utf8 = toByteArray(encoded.substring(matcher.start(), matcher.end()));
    decoded.append(new String(utf8, Charset.forName("UTF-8")));
    last = matcher.end();
  }
  return decoded.append(encoded.substring(last, encoded.length())).toString();
}

private static byte[] toByteArray(String hexSequence) {
  byte[] utf8 = new byte[hexSequence.length() / 4];
  for (int i = 0; i < utf8.length; i++) {
    int offset = i * 4;
    String hex = hexSequence.substring(offset + 2, offset + 4);
    utf8[i] = (byte) Integer.parseInt(hex, 16);
  }
  return utf8;
}
publicstaticvoidmain(字符串[]args){
字符串utf16=transformString(“这是\\xEF\\xBC\\xA1字符串”);
for(字符ch:utf16.toCharArray()){
系统输出格式(“%s%02x%n”,ch,(int)ch);
}
}
私有静态最终模式SEQ
=Pattern.compile((\\\\x\\p{Alnum}\\p{Alnum})+);
私有静态字符串转换字符串(字符串编码){
StringBuilder decoded=新StringBuilder();
匹配器匹配器=序列匹配器(编码);
int last=0;
while(matcher.find()){
decoded.append(encoded.substring(last,matcher.start());
字节[]utf8=toByteArray(编码的.substring(matcher.start(),matcher.end());
decoded.append(新字符串(utf8,Charset.forName(“UTF-8”));
last=matcher.end();
}
返回decoded.append(encoded.substring(last,encoded.length()).toString();
}
专用静态字节[]toByteArray(字符串hexSequence){
字节[]utf8=新字节[hexSequence.length()/4];
对于(int i=0;i
您的输入是字符串格式(
\xNN
)还是二进制格式?很好,但是我需要将\xNN\xNN字符串转换为unicode字符串,我已经编写了一个捕捉NN字符的regexp,但是如何从NN创建unicode字符串?F.e.我有NN我需要“\u0NN”(字符串添加在这里不起作用)Java字符串是UTF-16;试图在其中表示UTF-8(
“\u00EF\u00BC\u00A1”
)只会导致代码转换错误。在任何情况下,ISO-8859-1中都不存在全宽拉丁字母大写字母A。
public static void main(String[] args) {
  String utf16 = transformString("This is \\xEF\\xBC\\xA1 string");
  for (char ch : utf16.toCharArray()) {
    System.out.format("%s %02x%n", ch, (int) ch);
  }
}

private static final Pattern SEQ 
                           = Pattern.compile("(\\\\x\\p{Alnum}\\p{Alnum})+");

private static String transformString(String encoded) {
  StringBuilder decoded = new StringBuilder();
  Matcher matcher = SEQ.matcher(encoded);
  int last = 0;
  while (matcher.find()) {
    decoded.append(encoded.substring(last, matcher.start()));
    byte[] utf8 = toByteArray(encoded.substring(matcher.start(), matcher.end()));
    decoded.append(new String(utf8, Charset.forName("UTF-8")));
    last = matcher.end();
  }
  return decoded.append(encoded.substring(last, encoded.length())).toString();
}

private static byte[] toByteArray(String hexSequence) {
  byte[] utf8 = new byte[hexSequence.length() / 4];
  for (int i = 0; i < utf8.length; i++) {
    int offset = i * 4;
    String hex = hexSequence.substring(offset + 2, offset + 4);
    utf8[i] = (byte) Integer.parseInt(hex, 16);
  }
  return utf8;
}