Python将numpy数组插入sqlite3数据库

我试图在sqlite3数据库中存储一个约1000浮点数的numpy数组,但是我不断收到错误“InterfaceError:Error binding parameter 1 – probably unsupported type”。

我在印象之下BLOB数据types可能是任何东西,但它绝对不能用一个numpy数组。 这是我的尝试:

import sqlite3 as sql import numpy as np con = sql.connect('test.bd',isolation_level=None) cur = con.cursor() cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)") cur.execute("INSERT INTO foobar VALUES (?,?)", (None,np.arange(0,500,0.5))) con.commit() 

是否有另一个模块,我可以用来获得numpy数组到表中? 或者我可以将numpy数组转换成Python中的另一种forms(就像我可以分割的列表或string),sqlite将接受? 性能不是重中之重。 我只是想要它的工作!

谢谢!

你可以用sqlite3注册一个新的array数据types:

 import sqlite3 import numpy as np import io def adapt_array(arr): """ http://stackoverflow.com/a/31312102/190597 (SoulNibbler) """ out = io.BytesIO() np.save(out, arr) out.seek(0) return sqlite3.Binary(out.read()) def convert_array(text): out = io.BytesIO(text) out.seek(0) return np.load(out) # Converts np.array to TEXT when inserting sqlite3.register_adapter(np.ndarray, adapt_array) # Converts TEXT to np.array when selecting sqlite3.register_converter("array", convert_array) x = np.arange(12).reshape(2,6) con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES) cur = con.cursor() cur.execute("create table test (arr array)") 

使用这个设置,你可以简单地在语法中插入NumPy数组,而不会改变它:

 cur.execute("insert into test (arr) values (?)", (x, )) 

并从sqlite直接检索数组作为NumPy数组:

 cur.execute("select arr from test") data = cur.fetchone()[0] print(data) # [[ 0 1 2 3 4 5] # [ 6 7 8 9 10 11]] print(type(data)) # <type 'numpy.ndarray'> 

这适用于我:

 import sqlite3 as sql import numpy as np import json con = sql.connect('test.db',isolation_level=None) cur = con.cursor() cur.execute("DROP TABLE FOOBAR") cur.execute("CREATE TABLE foobar (id INTEGER PRIMARY KEY, array BLOB)") cur.execute("INSERT INTO foobar VALUES (?,?)", (None, json.dumps(np.arange(0,500,0.5).tolist()))) con.commit() cur.execute("SELECT * FROM FOOBAR") data = cur.fetchall() print data data = cur.fetchall() my_list = json.loads(data[0][1]) 

快乐飞跃第二closures,但我不断得到一个自动铸造串。 另外如果你看看这个其他的post: 关于使用缓冲区或二进制文件推送非文本数据到sqlite的一个有趣的辩论,你会看到,文件化的方法是避免所有在一起的缓冲区,并使用这一块代码。

 def adapt_array(arr): out = io.BytesIO() np.save(out, arr) out.seek(0) return sqlite3.Binary(out.read() 

我没有严重testing这在python 3,但它似乎在python 2.7工作

我认为matlab格式是一个非常方便的方式来存储和检索numpy数组。 速度非常磁盘和内存占用也相当。

加载/保存/磁盘比较

(从mverleg基准图像)

但是,如果由于某种原因需要将numpy数组存储到SQLite中,我build议添加一些压缩function。

来自unutbu代码的额外的行很简单

 compressor = 'zlib' # zlib, bz2 def adapt_array(arr): """ http://stackoverflow.com/a/31312102/190597 (SoulNibbler) """ # zlib uses similar disk size that Matlab v5 .mat files # bz2 compress 4 times zlib, but storing process is 20 times slower. out = io.BytesIO() np.save(out, arr) out.seek(0) return sqlite3.Binary(out.read().encode(compressor)) # zlib, bz2 def convert_array(text): out = io.BytesIO(text) out.seek(0) out = io.BytesIO(out.read().decode(compressor)) return np.load(out) 

MNIST数据库的testing结果如下:

 $ ./test_MNIST.py [69900]: 99% remain: 0 secs Storing 70000 images in 379.9 secs Retrieve 6990 images in 9.5 secs $ ls -lh example.db -rw-r--r-- 1 agp agp 69M sep 22 07:27 example.db $ ls -lh mnist-original.mat -rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat ``` 

使用zlib

 $ ./test_MNIST.py [69900]: 99% remain: 12 secs Storing 70000 images in 8536.2 secs Retrieve 6990 images in 37.4 secs $ ls -lh example.db -rw-r--r-- 1 agp agp 19M sep 22 03:33 example.db $ ls -lh mnist-original.mat -rw-r--r-- 1 agp agp 53M sep 20 17:59 mnist-original.mat 

使用bz2

比较Matlab V5格式和bz2在SQLite上,bz2的压缩比在2.8左右,但是与Matlab格式相比,访问时间相当长(几乎是瞬间比30秒多)。 也许只有在数据库非常庞大的情况下,学习过程比访问时间要耗费大量时间,或者数据库占用空间要尽可能小。

最后要注意的是bipz/zlib比例大约是3.7, zlib/matlab需要30%的空间。

完整的代码,如果你想玩自己是:

 import sqlite3 import numpy as np import io compressor = 'zlib' # zlib, bz2 def adapt_array(arr): """ http://stackoverflow.com/a/31312102/190597 (SoulNibbler) """ # zlib uses similar disk size that Matlab v5 .mat files # bz2 compress 4 times zlib, but storing process is 20 times slower. out = io.BytesIO() np.save(out, arr) out.seek(0) return sqlite3.Binary(out.read().encode(compressor)) # zlib, bz2 def convert_array(text): out = io.BytesIO(text) out.seek(0) out = io.BytesIO(out.read().decode(compressor)) return np.load(out) sqlite3.register_adapter(np.ndarray, adapt_array) sqlite3.register_converter("array", convert_array) dbname = 'example.db' def test_save_sqlite_arrays(): "Load MNIST database (70000 samples) and store in a compressed SQLite db" os.path.exists(dbname) and os.unlink(dbname) con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES) cur = con.cursor() cur.execute("create table test (idx integer primary key, X array, y integer );") mnist = fetch_mldata('MNIST original') X, y = mnist.data, mnist.target m = X.shape[0] t0 = time.time() for i, x in enumerate(X): cur.execute("insert into test (idx, X, y) values (?,?,?)", (i, y, int(y[i]))) if not i % 100 and i > 0: elapsed = time.time() - t0 remain = float(m - i) / i * elapsed print "\r[%5d]: %3d%% remain: %d secs" % (i, 100 * i / m, remain), sys.stdout.flush() con.commit() con.close() elapsed = time.time() - t0 print print "Storing %d images in %0.1f secs" % (m, elapsed) def test_load_sqlite_arrays(): "Query MNIST SQLite database and load some samples" con = sqlite3.connect(dbname, detect_types=sqlite3.PARSE_DECLTYPES) cur = con.cursor() # select all images labeled as '2' t0 = time.time() cur.execute('select idx, X, y from test where y = 2') data = cur.fetchall() elapsed = time.time() - t0 print "Retrieve %d images in %0.1f secs" % (len(data), elapsed) if __name__ == '__main__': test_save_sqlite_arrays() test_load_sqlite_arrays()