Pickle与shelve在Python中存储大型词典_Python_Pickle_Shelve

Pickle与shelve在Python中存储大型词典

python

Pickle与shelve在Python中存储大型词典,python,pickle,shelve,Python,Pickle,Shelve,如果我将一个大目录存储为pickle文件，那么通过cPickle加载它是否意味着它将一次全部消耗到内存中如果是这样的话，是否有一种跨平台的方式来获取类似于pickle的内容，但在一个项目中访问每个条目一个键（即，避免将所有词典加载到内存中，只按名称加载每个条目）？我知道shelve应该这样做：那是不是像pickle那样便携我知道shelve应该这么做：那是不是像pickle一样便携对。是Python的一部分，并用Python编写编辑如果你有一本大字典： bigd = {'a': 1,

如果我将一个大目录存储为

pickle

文件，那么通过

cPickle

加载它是否意味着它将一次全部消耗到内存中

如果是这样的话，是否有一种跨平台的方式来获取类似于pickle的内容，但在一个项目中访问每个条目一个键（即，避免将所有词典加载到内存中，只按名称加载每个条目）？我知道

shelve

应该这样做：那是不是像

pickle

那样便携

我知道shelve应该这么做：那是不是像pickle一样便携

对。是Python的一部分，并用Python编写

编辑如果你有一本大字典：

bigd = {'a': 1, 'b':2, # . . .
}

如果你想保存它而不必在以后阅读整个内容，那么不要将其保存为pickle，最好将其保存为shelf，一种磁盘上的字典

import shelve

myShelve = shelve.open('my.shelve')
myShelve.update(bigd)
myShelve.close()

然后，您可以：

import shelve

myShelve = shelve.open('my.shelve')
value = myShelve['a']
value += 1
myShelve['a'] = value

基本上，您将shelve对象视为dict，但这些项存储在磁盘上（作为单个pickle）并根据需要读取

如果您的对象可以存储为属性列表，那么这可能是一个不错的选择。书架和pickle很方便，但只能通过Python访问，但sqlite数据库可以从大多数语言读取。

如果您想要一个比

shelve

更健壮的模块，您可以查看

klepot

klepto

构建用于为磁盘或数据库上的平台无关存储提供字典接口，并且构建用于处理大数据

在这里，我们首先创建一些存储在磁盘上的pickle对象。他们使用

目录归档

，每个文件存储一个对象

>>> d = dict(zip('abcde',range(5)))
>>> d['f'] = max
>>> d['g'] = lambda x:x**2
>>> 
>>> import klepto
>>> help(klepto.archives.dir_archive)       

>>> print klepto.archives.dir_archive.__new__.__doc__
initialize a dictionary with a file-folder archive backend

    Inputs:
        name: name of the root archive directory [default: memo]
        dict: initial dictionary to seed the archive
        cached: if True, use an in-memory cache interface to the archive
        serialized: if True, pickle file contents; otherwise save python objects
        compression: compression level (0 to 9) [default: 0 (no compression)]
        memmode: access mode for files, one of {None, 'r+', 'r', 'w+', 'c'}
        memsize: approximate size (in MB) of cache for in-memory compression

>>> a = klepto.archives.dir_archive(dict=d)
>>> a
dir_archive('memo', {'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': <function <lambda> at 0x102f562a8>, 'f': <built-in function max>}, cached=True)
>>> a.dump()
>>> del a

klepot

还为

sql

存档提供了相同的接口

>>> print klepto.archives.sql_archive.__new__.__doc__
initialize a dictionary with a sql database archive backend

    Connect to an existing database, or initialize a new database, at the
    selected database url. For example, to use a sqlite database 'foo.db'
    in the current directory, database='sqlite:///foo.db'. To use a mysql
    database 'foo' on localhost, database='mysql://user:pass@localhost/foo'.
    For postgresql, use database='postgresql://user:pass@localhost/foo'. 
    When connecting to sqlite, the default database is ':memory:'; otherwise,
    the default database is 'defaultdb'. If sqlalchemy is not installed,
    storable values are limited to strings, integers, floats, and other
    basic objects. If sqlalchemy is installed, additional keyword options
    can provide database configuration, such as connection pooling.
    To use a mysql or postgresql database, sqlalchemy must be installed.

    Inputs:
        name: url for the sql database [default: (see note above)]
        dict: initial dictionary to seed the archive
        cached: if True, use an in-memory cache interface to the archive
        serialized: if True, pickle table contents; otherwise cast as strings

>>> c = klepto.archives.sql_archive('database')
>>> c.update(b)
>>> c
sql_archive('sqlite:///database', {'a': 0, 'b': 1, 'g': <function <lambda> at 0x10446b1b8>, 'f': <built-in function max>}, cached=True)
>>> c.dump()

获取

klepot

此处：

pickle是一种安全漏洞内容在解压时最终会被执行（或评估）

shelve

使用某种风格的DBM数据库来存储pickle对象。它至少应该像pickle一样可移植。虽然@JoranBeasley在技术上是正确的，但是当你自己编写pickle时，它是非常有用和安全的。不要接受来自不可信来源的pickle，但可以将其用于您自己数据的序列化。@JoranBeasley

shelve

也容易出现与

pickle

相同的安全漏洞，由于它是由

pickle

支持的。那么pickle是否总是将所有pickle对象加载到内存中？Shelve不是跨平台的？您不需要关闭shelf来将更改刷新到磁盘吗？@AlwaysLearning，Python 2.x文档特别指出：与文件对象一样，搁置对象也应该显式关闭，以确保持久数据被刷新到磁盘上？“否。例如，在我的计算机上，它选择由

dbhash

支持。如果我创建了一个DB文件并将其移动到另一台机器上，而

dbhash

不可用，

shelve.open（…）

将在该文件上失败。目前唯一可移植的纯Python选项是

dumbdbm

。要制作便携式书架，需要将其设置为显式使用

dumbdbm

（这不是默认设置）。请注意，我是

klepot

>>> print klepto.archives.sql_archive.__new__.__doc__
initialize a dictionary with a sql database archive backend

    Connect to an existing database, or initialize a new database, at the
    selected database url. For example, to use a sqlite database 'foo.db'
    in the current directory, database='sqlite:///foo.db'. To use a mysql
    database 'foo' on localhost, database='mysql://user:pass@localhost/foo'.
    For postgresql, use database='postgresql://user:pass@localhost/foo'. 
    When connecting to sqlite, the default database is ':memory:'; otherwise,
    the default database is 'defaultdb'. If sqlalchemy is not installed,
    storable values are limited to strings, integers, floats, and other
    basic objects. If sqlalchemy is installed, additional keyword options
    can provide database configuration, such as connection pooling.
    To use a mysql or postgresql database, sqlalchemy must be installed.

    Inputs:
        name: url for the sql database [default: (see note above)]
        dict: initial dictionary to seed the archive
        cached: if True, use an in-memory cache interface to the archive
        serialized: if True, pickle table contents; otherwise cast as strings

>>> c = klepto.archives.sql_archive('database')
>>> c.update(b)
>>> c
sql_archive('sqlite:///database', {'a': 0, 'b': 1, 'g': <function <lambda> at 0x10446b1b8>, 'f': <built-in function max>}, cached=True)
>>> c.dump()

>>> b['x'] = 69
>>> c['y'] = 96
>>> b.dump('x')
>>> c.dump('y')