Python 压缩json以存储在基于内存的存储（如redis或memcache）中的最佳方法是什么？_Python_Json_Redis_Msgpack

Python 压缩json以存储在基于内存的存储（如redis或memcache）中的最佳方法是什么？

python json redis

Python 压缩json以存储在基于内存的存储（如redis或memcache）中的最佳方法是什么？,python,json,redis,msgpack,Python,Json,Redis,Msgpack,要求： Python对象具有2-3级嵌套，包含基本数据类型，如整数、字符串、列表和dict。（无日期等），需要在redis中根据密钥存储为json。将json压缩为字符串以降低内存占用的最佳方法是什么。目标对象不是很大，平均有1000个小元素，或转换为JSON时约15000个字符例如 1/有没有其他更好的压缩json的方法来节省redis中的内存（同时确保之后的轻量解码） 2/一名候选人的msgpack能力如何[http://msgpack.org/]? 3，我也要考虑像泡菜这样的选

要求： Python对象具有2-3级嵌套，包含基本数据类型，如整数、字符串、列表和dict。（无日期等），需要在redis中根据密钥存储为json。将json压缩为字符串以降低内存占用的最佳方法是什么。目标对象不是很大，平均有1000个小元素，或转换为JSON时约15000个字符

例如

1/有没有其他更好的压缩json的方法来节省redis中的内存（同时确保之后的轻量解码）

2/一名候选人的msgpack能力如何[http://msgpack.org/]?

3，我也要考虑像泡菜这样的选择吗？

< P>另一种可能是使用MunGDB的存储格式，

您可以在该站点的实现页面中找到两个python实现

编辑：为什么不保存字典，并在检索时转换为json？

我们只是使用

gzip

作为压缩器

import gzip
import cStringIO

def decompressStringToFile(value, outputFile):
  """
  decompress the given string value (which must be valid compressed gzip
  data) and write the result in the given open file.
  """
  stream = cStringIO.StringIO(value)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      outputFile.close()
      return 
    outputFile.write(chunk)

def compressFileToString(inputFile):
  """
  read the given open file, compress the data and return it as string.
  """
  stream = cStringIO.StringIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = inputFile.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

在我们的用例中，我们将结果存储为文件，正如您可以想象的那样。要仅使用内存中的字符串，还可以使用

cStringIO.StringIO（）

对象替换文件。

一种简单的“后期处理”方法是构建一个“短键名”映射，并在存储之前运行生成的json，然后在反序列化到对象之前再次（反向）运行。例如：

Before: {"details":{"1":{"age":13,"name":"dhruv"},"2":{"age":15,"name":"Matt"}},"members":["1","2"]}
Map: details:d, age:a, name:n, members:m
Result: {"d":{"1":{"a":13,"n":"dhruv"},"2":{"a":15,"n":"Matt"}},"m":["1","2"]}

只需遍历json并在到达数据库的过程中替换key->value，在到达应用程序的过程中替换value->key

你也可以用gzip来获得额外的好处（不过之后就不会是字符串了）。

如果你想让它更快的话。如果您希望它更好地压缩

还有其他更好的压缩json以节省内存的方法吗 redis（也确保后期解码重量轻）

一个候选人会有多好

Msgpack相对较快，内存占用较小，但对我来说通常较快。您应该在数据上比较它们，测量压缩和解压缩速率以及压缩比

我也应该考虑泡菜之类的选择吗？< /P> 考虑pickle（partucular中的cPickle）和marshal。它们都很快。但请记住，它们不安全或不可扩展，您需要为速度付出额外的责任。

基于上面的@Alfe，这里有一个版本可以将内容保存在内存中（用于网络I/O任务）。我还做了一些更改以支持Python 3

import gzip
from io import StringIO, BytesIO

def decompressBytesToString(inputBytes):
  """
  decompress the given byte array (which must be valid 
  compressed gzip data) and return the decoded text (utf-8).
  """
  bio = BytesIO()
  stream = BytesIO(inputBytes)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      bio.seek(0)
      return bio.read().decode("utf-8")
    bio.write(chunk)
  return None

def compressStringToBytes(inputString):
  """
  read the given string, encode it in utf-8,
  compress the data and return it as a byte array.
  """
  bio = BytesIO()
  bio.write(inputString.encode("utf-8"))
  bio.seek(0)
  stream = BytesIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = bio.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

要测试压缩，请尝试：

inputString="asdf" * 1000
len(inputString)
len(compressStringToBytes(inputString))
decompressBytesToString(compressStringToBytes(inputString))

我对不同的二进制格式（MessagePack、BSON、Ion、Smile-CBOR）和压缩算法（Brotli、Gzip、XZ、Zstandard、bzip2）进行了广泛的比较

对于我用于测试的JSON数据，将数据保持为JSON并使用Brotli压缩是最好的解决方案。Brotli具有不同的压缩级别，因此，如果要长时间保存数据，则使用高级别的压缩是值得的。如果不长时间保存数据，则使用较低级别的压缩n或使用Zstandard可能是最有效的

Gzip很容易，但几乎可以肯定会有更快或提供更好压缩的替代方案，或者两者兼而有之

您可以在此处阅读我们调查的全部细节：

我不认为BSON可以作为redis中的键的值添加。@DhruvPathak当然可以，为什么不呢？redis对您在键中存储的内容没有意见。@Jonathanhedborg感谢您的更正。我没有注意redis字符串是二进制安全的这一点。但是，BSON是没有比JSON更紧凑（如在他们的站点上所说的），所以它不是真正的选择。你的应用程序的要求是什么？你需要性能吗？可靠性，一致性等等？你会考虑ReDIS的替代品吗？最好用GZIP使用<代码> GzipFile（FielObj=流，模式=‘W’）作为压缩器：？在通常的

open

python函数中，它允许在循环停止时适当地关闭文件。

inputString="asdf" * 1000
len(inputString)
len(compressStringToBytes(inputString))
decompressBytesToString(compressStringToBytes(inputString))