如何将PostgreSQL bytea列十六进制解码为r中的int16/uint16?

如何将PostgreSQL bytea列十六进制解码为r中的int16/uint16?,r,postgresql,hex,decode,bytea,R,Postgresql,Hex,Decode,Bytea,我将一些图像数据作为bytea存储在PostgreSQL数据库表列中。我也有用于解释它的数据元数据,相关的是图像维度和类。课程包括int16、uint16。我在R中找不到有关正确解释有符号/无符号整数的任何信息 我正在使用RPostgreSQL将数据拉入R,我想在R中查看图像 MWE: 此解决方案存在两个问题: 它不适用于有符号类型,因此负整数值将被解释为无符号值(例如,“ffff”为-1(int16),但为65535(uint16),strtoi()将始终返回65535) 它目前只针对int1

我将一些图像数据作为bytea存储在PostgreSQL数据库表列中。我也有用于解释它的数据元数据,相关的是图像维度和类。课程包括int16、uint16。我在R中找不到有关正确解释有符号/无符号整数的任何信息

我正在使用RPostgreSQL将数据拉入R,我想在R中查看图像

MWE:

此解决方案存在两个问题:

  • 它不适用于有符号类型,因此负整数值将被解释为无符号值(例如,“ffff”为-1(int16),但为65535(uint16),strtoi()将始终返回65535)
  • 它目前只针对int16进行编码,并且需要一些额外的代码来处理其他类型(例如int32、int64)
  • 任何人都有可以使用签名类型的解决方案吗?

    您可以从开始,用更快的
    strsplit
    替换,并对结果使用
    readBin

    byteArray <- "\\xffff00000100020003000400050006000700080009000a00"
    
    ## Split a long string into a a vector of character pairs
    Rcpp::cppFunction( code = '
    CharacterVector strsplit2(const std::string& hex) {
      unsigned int length = hex.length()/2;
      CharacterVector res(length);
      for (unsigned int i = 0; i < length; ++i) {
        res(i) = hex.substr(2*i, 2);
      }
      return res;
    }')
    
    ## A function to convert one string to an array of raw
    f <- function(x)  {
      ## Split a long string into a a vector of character pairs
      x <- strsplit2(x)
      ## Remove the first element, "\\x"
      x <- x[-1]
      ## Complete the conversion
      as.raw(as.hexmode(x))
    }
    
    raw <- f(byteArray)
    # int16
    readBin(con = raw,
            what = "integer",
            n = length(raw) / 2,
            size = 2,
            signed = TRUE,
            endian = "little")
    # -1  0  1  2  3  4  5  6  7  8  9 10
    
    # uint16
    readBin(con = raw,
            what = "integer",
            n = length(raw) / 2,
            size = 2,
            signed = FALSE,
            endian = "little")
    # 65535     0     1     2     3     4     5     6     7     8     9    10
    
    # int32
    readBin(con = raw,
            what = "integer",
            n = length(raw) / 4,
            size = 4,
            signed = TRUE,
            endian = "little")
    # 65535 131073 262147 393221 524295 655369
    
    对于
    gzip
    压缩数据:

    # gzip
    byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
    con <- gzcon(rawConnection(f(byteArray)))
    readBin(con = con,
            what = "integer",
            n = length(raw) / 2,
            size = 2,
            signed = TRUE,
            endian = "little")
    close(con = con)
    
    #gzip
    
    byteArray@Brian,一个有趣的问题。您可以使用
    readBin(con=gzcon(rawConnection(f(byteArray)),…)
    。因此
    f()
    中的
    strsplit()
    会以非常大的向量(例如512x512)中断。它固定CPU,永远不会返回,我必须终止进程或重新启动R。但是,如果我在我的问题中使用
    strsplit()
    的版本,它工作得非常好。我对压缩字节数组与未压缩字节数组进行了微基准标记,发现这些连接仍然存在。。。最终碰到了R的128个连接的限制,并导致了一个错误。因此,我将其拆分为
    c@bri,谢谢,我已经采纳了您的建议。显然,字符串拆分有一个更快的版本,请参见@GSee-answer-benchmarking:
    
    byteArray <- "\\xffff00000100020003000400050006000700080009000a00"
    
    ## Split a long string into a a vector of character pairs
    Rcpp::cppFunction( code = '
    CharacterVector strsplit2(const std::string& hex) {
      unsigned int length = hex.length()/2;
      CharacterVector res(length);
      for (unsigned int i = 0; i < length; ++i) {
        res(i) = hex.substr(2*i, 2);
      }
      return res;
    }')
    
    ## A function to convert one string to an array of raw
    f <- function(x)  {
      ## Split a long string into a a vector of character pairs
      x <- strsplit2(x)
      ## Remove the first element, "\\x"
      x <- x[-1]
      ## Complete the conversion
      as.raw(as.hexmode(x))
    }
    
    raw <- f(byteArray)
    # int16
    readBin(con = raw,
            what = "integer",
            n = length(raw) / 2,
            size = 2,
            signed = TRUE,
            endian = "little")
    # -1  0  1  2  3  4  5  6  7  8  9 10
    
    # uint16
    readBin(con = raw,
            what = "integer",
            n = length(raw) / 2,
            size = 2,
            signed = FALSE,
            endian = "little")
    # 65535     0     1     2     3     4     5     6     7     8     9    10
    
    # int32
    readBin(con = raw,
            what = "integer",
            n = length(raw) / 4,
            size = 4,
            signed = TRUE,
            endian = "little")
    # 65535 131073 262147 393221 524295 655369
    
    # uint32
    byteArray <- "\\xffffffff0100020003000400050006000700080009000a00"
    int32 <- readBin(con = f(byteArray),
                     what = "integer",
                     n = length(raw) / 4,
                     size = 4,
                     signed = TRUE,
                     endian = "little")
    
    ifelse(int32 < 0, int32 + 2^32, int32)
    # 4294967295     131073     262147     393221     524295     655369
    
    # gzip
    byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
    con <- gzcon(rawConnection(f(byteArray)))
    readBin(con = con,
            what = "integer",
            n = length(raw) / 2,
            size = 2,
            signed = TRUE,
            endian = "little")
    close(con = con)