如何将PostgreSQL bytea列十六进制解码为r中的int16/uint16?
我将一些图像数据作为bytea存储在PostgreSQL数据库表列中。我也有用于解释它的数据元数据,相关的是图像维度和类。课程包括int16、uint16。我在R中找不到有关正确解释有符号/无符号整数的任何信息 我正在使用RPostgreSQL将数据拉入R,我想在R中查看图像 MWE: 此解决方案存在两个问题:如何将PostgreSQL bytea列十六进制解码为r中的int16/uint16?,r,postgresql,hex,decode,bytea,R,Postgresql,Hex,Decode,Bytea,我将一些图像数据作为bytea存储在PostgreSQL数据库表列中。我也有用于解释它的数据元数据,相关的是图像维度和类。课程包括int16、uint16。我在R中找不到有关正确解释有符号/无符号整数的任何信息 我正在使用RPostgreSQL将数据拉入R,我想在R中查看图像 MWE: 此解决方案存在两个问题: 它不适用于有符号类型,因此负整数值将被解释为无符号值(例如,“ffff”为-1(int16),但为65535(uint16),strtoi()将始终返回65535) 它目前只针对int1
strsplit
替换,并对结果使用readBin
:
byteArray <- "\\xffff00000100020003000400050006000700080009000a00"
## Split a long string into a a vector of character pairs
Rcpp::cppFunction( code = '
CharacterVector strsplit2(const std::string& hex) {
unsigned int length = hex.length()/2;
CharacterVector res(length);
for (unsigned int i = 0; i < length; ++i) {
res(i) = hex.substr(2*i, 2);
}
return res;
}')
## A function to convert one string to an array of raw
f <- function(x) {
## Split a long string into a a vector of character pairs
x <- strsplit2(x)
## Remove the first element, "\\x"
x <- x[-1]
## Complete the conversion
as.raw(as.hexmode(x))
}
raw <- f(byteArray)
# int16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
# -1 0 1 2 3 4 5 6 7 8 9 10
# uint16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = FALSE,
endian = "little")
# 65535 0 1 2 3 4 5 6 7 8 9 10
# int32
readBin(con = raw,
what = "integer",
n = length(raw) / 4,
size = 4,
signed = TRUE,
endian = "little")
# 65535 131073 262147 393221 524295 655369
对于gzip
压缩数据:
# gzip
byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
con <- gzcon(rawConnection(f(byteArray)))
readBin(con = con,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
close(con = con)
#gzip
byteArray@Brian,一个有趣的问题。您可以使用readBin(con=gzcon(rawConnection(f(byteArray)),…)
。因此f()
中的strsplit()
会以非常大的向量(例如512x512)中断。它固定CPU,永远不会返回,我必须终止进程或重新启动R。但是,如果我在我的问题中使用strsplit()
的版本,它工作得非常好。我对压缩字节数组与未压缩字节数组进行了微基准标记,发现这些连接仍然存在。。。最终碰到了R的128个连接的限制,并导致了一个错误。因此,我将其拆分为c@bri,谢谢,我已经采纳了您的建议。显然,字符串拆分有一个更快的版本,请参见@GSee-answer-benchmarking:
byteArray <- "\\xffff00000100020003000400050006000700080009000a00"
## Split a long string into a a vector of character pairs
Rcpp::cppFunction( code = '
CharacterVector strsplit2(const std::string& hex) {
unsigned int length = hex.length()/2;
CharacterVector res(length);
for (unsigned int i = 0; i < length; ++i) {
res(i) = hex.substr(2*i, 2);
}
return res;
}')
## A function to convert one string to an array of raw
f <- function(x) {
## Split a long string into a a vector of character pairs
x <- strsplit2(x)
## Remove the first element, "\\x"
x <- x[-1]
## Complete the conversion
as.raw(as.hexmode(x))
}
raw <- f(byteArray)
# int16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
# -1 0 1 2 3 4 5 6 7 8 9 10
# uint16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = FALSE,
endian = "little")
# 65535 0 1 2 3 4 5 6 7 8 9 10
# int32
readBin(con = raw,
what = "integer",
n = length(raw) / 4,
size = 4,
signed = TRUE,
endian = "little")
# 65535 131073 262147 393221 524295 655369
# uint32
byteArray <- "\\xffffffff0100020003000400050006000700080009000a00"
int32 <- readBin(con = f(byteArray),
what = "integer",
n = length(raw) / 4,
size = 4,
signed = TRUE,
endian = "little")
ifelse(int32 < 0, int32 + 2^32, int32)
# 4294967295 131073 262147 393221 524295 655369
# gzip
byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
con <- gzcon(rawConnection(f(byteArray)))
readBin(con = con,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
close(con = con)