Python 使用SQLite的NumPy数组_Python_Arrays_Sqlite_Numpy_Scipy

Python 使用SQLite的NumPy数组

python arrays sqlite numpy

Python 使用SQLite的NumPy数组,python,arrays,sqlite,numpy,scipy,Python,Arrays,Sqlite,Numpy,Scipy,我在Python中看到的最常见的SQLite接口是sqlite3，但是有什么可以与NumPy数组或recarray一起工作的吗？我的意思是，它可以识别数据类型，不需要逐行插入，并提取到NumPy（rec）数组中。。。？有点像RDB或sqldf库中的R的SQL函数，如果有人熟悉这些函数的话（它们向R数据表或从R数据表导入/导出/附加整个表或表的子集）。为什么不试试呢您感兴趣的两个平台的驱动程序都是可用的——python（redis，通过包索引）和R（rredis，） redis的天才不在于它能够

我在Python中看到的最常见的SQLite接口是

sqlite3

，但是有什么可以与NumPy数组或recarray一起工作的吗？我的意思是，它可以识别数据类型，不需要逐行插入，并提取到NumPy（rec）数组中。。。？有点像

RDB

或

sqldf

库中的R的SQL函数，如果有人熟悉这些函数的话（它们向R数据表或从R数据表导入/导出/附加整个表或表的子集）。

为什么不试试呢

您感兴趣的两个平台的驱动程序都是可用的——python（redis，通过包索引）和R（rredis，）

redis的天才不在于它能够神奇地识别NumPy数据类型，并允许您像插入和提取原生redis数据类型一样插入和提取多维NumPy数组，而在于它的天才在于您可以非常轻松地使用几行代码创建这样的接口

有（至少）几个关于python中redis的教程；上面的那个特别好

import numpy as NP

# create some data
A = NP.random.randint(0, 10, 40).reshape(8, 5)

# a couple of utility functions to (i) manipulate NumPy arrays prior to insertion 
# into redis db for more compact storage & 
# (ii) to restore the original NumPy data types upon retrieval from redis db
fnx2 = lambda v : map(int, list(v))
fnx = lambda v : ''.join(map(str, v))

# start the redis server (e.g. from a bash prompt)
$> cd /usr/local/bin      # default install directory for 'nix
$> redis-server           # starts the redis server

# start the redis client:
from redis import Redis
r0 = Redis(db=0, port=6379, host='localhost')       # same as: r0 = Redis()

# to insert items using redis 'string' datatype, call 'set' on the database, r0, and
# just pass in a key, and the item to insert
r0.set('k1', A[0,:])

# row-wise insertion the 2D array into redis, iterate over the array:
for c in range(A.shape[0]):
    r0.set( "k{0}".format(c), fnx(A[c,:]) )

# or to insert all rows at once
# use 'mset' ('multi set') and pass in a key-value mapping: 
x = dict([sublist for sublist in enumerate(A.tolist())])
r0.mset(x1)

# to retrieve a row, pass its key to 'get'
>>> r0.get('k0')
  '63295'

# retrieve the entire array from redis:
kx = r0.keys('*')           # returns all keys in redis database, r0

for key in kx :
    r0.get(key)

# to retrieve it in original form:
A = []
for key in kx:
    A.append(fnx2(r0.get("{0}".format(key))))

>>> A = NP.array(A)
>>> A
  array([[ 6.,  2.,  3.,  3.,  9.],
         [ 4.,  9.,  6.,  2.,  3.],
         [ 3.,  7.,  9.,  5.,  0.],
         [ 5.,  2.,  6.,  3.,  4.],
         [ 7.,  1.,  5.,  0.,  2.],
         [ 8.,  6.,  1.,  5.,  8.],
         [ 1.,  7.,  6.,  4.,  9.],
         [ 6.,  4.,  1.,  3.,  6.]])

这看起来有点旧，但有什么原因不能只执行fetchall（）而不是迭代，然后在声明上初始化numpy

我发现至少有三个Python包：

，这是
我自己的（可从中获得）

这些包中的每一个都必须处理SQLite（默认情况下）只理解而不理解的问题，例如numpy.int64

ReqSQL 0.7.8+为我工作（大部分时间），但我认为这是一个非常糟糕的黑客和浏览代码，似乎更成熟。

< P >道格的建议与ReDIS相当不错，但我认为他的代码有点复杂，因此，相当缓慢。出于我的目的，我必须在不到十分之一秒的时间内序列化+写入，然后抓取+反序列化一个大约一百万个浮点数的方阵，所以我这样做：

写作：

snapshot = np.random.randn(1024,1024)
serialized = snapshot.tobytes()
rs.set('snapshot_key', serialized)

然后是：

s = rs.get('snapshot_key')
deserialized = np.frombuffer(s).astype(np.float32)
rank = np.sqrt(deserialized.size).astype(int)
snap = deserialized(rank, rank)

您可以使用%time对ipython进行一些基本的性能测试，但tobytes或frombuffer都不会超过几毫秒。

我建议您查看一下。它使用HDF5作为后端，而不是SQLite，但也支持强大的查询。谢谢，但我希望它在SQLite中，以便R可以尝试一下（R使用SQL表比使用HDF5文件要好得多）。这通常是在程序之间交换数据/信息的一个好主意也谢谢你的例子。不幸的是，redis是内存中的解决方案？而且我想交换的数据相当大，所以能够使用硬盘文件是可取的…谢谢！这对我帮助很大！很好的建议，但是请注意

np.frombuffer（s）.astype（np.float32）

：这将解析缓冲区

，就像它有dtype

np.float64

一样，然后将其强制转换为

np.float32

。如果原始矩阵具有dtype

np.float32

这将返回一个大小为一半的矩阵。最好使用

np.frombuffer

的

dtype

参数（即

np.frombuffer（s，dtype=np.float32）

）很好，我在这里很马虎，尤其是混合了python int。