如何将PostgreSQL bytea列十六进制解码为r中的int16/uint16？_R_Postgresql_Hex_Decode_Bytea

如何将PostgreSQL bytea列十六进制解码为r中的int16/uint16？

r postgresql

如何将PostgreSQL bytea列十六进制解码为r中的int16/uint16？,r,postgresql,hex,decode,bytea,R,Postgresql,Hex,Decode,Bytea,我将一些图像数据作为bytea存储在PostgreSQL数据库表列中。我也有用于解释它的数据元数据，相关的是图像维度和类。课程包括int16、uint16。我在R中找不到有关正确解释有符号/无符号整数的任何信息我正在使用RPostgreSQL将数据拉入R，我想在R中查看图像 MWE：此解决方案存在两个问题：它不适用于有符号类型，因此负整数值将被解释为无符号值（例如，“ffff”为-1（int16），但为65535（uint16），strtoi（）将始终返回65535）它目前只针对int1

我将一些图像数据作为bytea存储在PostgreSQL数据库表列中。我也有用于解释它的数据元数据，相关的是图像维度和类。课程包括int16、uint16。我在R中找不到有关正确解释有符号/无符号整数的任何信息

我正在使用RPostgreSQL将数据拉入R，我想在R中查看图像

MWE：

此解决方案存在两个问题：

它不适用于有符号类型，因此负整数值将被解释为无符号值（例如，“ffff”为-1（int16），但为65535（uint16），strtoi（）将始终返回65535）

它目前只针对int16进行编码，并且需要一些额外的代码来处理其他类型（例如int32、int64）

任何人都有可以使用签名类型的解决方案吗？

您可以从开始，用更快的

strsplit

替换，并对结果使用

readBin

：

byteArray <- "\\xffff00000100020003000400050006000700080009000a00"

## Split a long string into a a vector of character pairs
Rcpp::cppFunction( code = '
CharacterVector strsplit2(const std::string& hex) {
  unsigned int length = hex.length()/2;
  CharacterVector res(length);
  for (unsigned int i = 0; i < length; ++i) {
    res(i) = hex.substr(2*i, 2);
  }
  return res;
}')

## A function to convert one string to an array of raw
f <- function(x)  {
  ## Split a long string into a a vector of character pairs
  x <- strsplit2(x)
  ## Remove the first element, "\\x"
  x <- x[-1]
  ## Complete the conversion
  as.raw(as.hexmode(x))
}

raw <- f(byteArray)
# int16
readBin(con = raw,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = TRUE,
        endian = "little")
# -1  0  1  2  3  4  5  6  7  8  9 10

# uint16
readBin(con = raw,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = FALSE,
        endian = "little")
# 65535     0     1     2     3     4     5     6     7     8     9    10

# int32
readBin(con = raw,
        what = "integer",
        n = length(raw) / 4,
        size = 4,
        signed = TRUE,
        endian = "little")
# 65535 131073 262147 393221 524295 655369

对于

gzip

压缩数据：

# gzip
byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
con <- gzcon(rawConnection(f(byteArray)))
readBin(con = con,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = TRUE,
        endian = "little")
close(con = con)

#gzip
byteArray@Brian，一个有趣的问题。您可以使用readBin（con=gzcon（rawConnection（f（byteArray）），…）
。因此f（）
中的strsplit（）
会以非常大的向量（例如512x512）中断。它固定CPU，永远不会返回，我必须终止进程或重新启动R。但是，如果我在我的问题中使用strsplit（）
的版本，它工作得非常好。我对压缩字节数组与未压缩字节数组进行了微基准标记，发现这些连接仍然存在。。。最终碰到了R的128个连接的限制，并导致了一个错误。因此，我将其拆分为c@bri，谢谢，我已经采纳了您的建议。显然，字符串拆分有一个更快的版本，请参见@GSee-answer-benchmarking：
byteArray <- "\\xffff00000100020003000400050006000700080009000a00"

## Split a long string into a a vector of character pairs
Rcpp::cppFunction( code = '
CharacterVector strsplit2(const std::string& hex) {
  unsigned int length = hex.length()/2;
  CharacterVector res(length);
  for (unsigned int i = 0; i < length; ++i) {
    res(i) = hex.substr(2*i, 2);
  }
  return res;
}')

## A function to convert one string to an array of raw
f <- function(x)  {
  ## Split a long string into a a vector of character pairs
  x <- strsplit2(x)
  ## Remove the first element, "\\x"
  x <- x[-1]
  ## Complete the conversion
  as.raw(as.hexmode(x))
}

raw <- f(byteArray)
# int16
readBin(con = raw,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = TRUE,
        endian = "little")
# -1  0  1  2  3  4  5  6  7  8  9 10

# uint16
readBin(con = raw,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = FALSE,
        endian = "little")
# 65535     0     1     2     3     4     5     6     7     8     9    10

# int32
readBin(con = raw,
        what = "integer",
        n = length(raw) / 4,
        size = 4,
        signed = TRUE,
        endian = "little")
# 65535 131073 262147 393221 524295 655369

# uint32
byteArray <- "\\xffffffff0100020003000400050006000700080009000a00"
int32 <- readBin(con = f(byteArray),
                 what = "integer",
                 n = length(raw) / 4,
                 size = 4,
                 signed = TRUE,
                 endian = "little")

ifelse(int32 < 0, int32 + 2^32, int32)
# 4294967295     131073     262147     393221     524295     655369

# gzip
byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
con <- gzcon(rawConnection(f(byteArray)))
readBin(con = con,
        what = "integer",
        n = length(raw) / 2,
        size = 2,
        signed = TRUE,
        endian = "little")
close(con = con)