Python 如何处理Sqlite中列的重复？通过使用压缩？_Python_Sql_Sqlite_Compression

Python 如何处理Sqlite中列的重复？通过使用压缩？

python sql sqlite compression

Python 如何处理Sqlite中列的重复？通过使用压缩？,python,sql,sqlite,compression,Python,Sql,Sqlite,Compression,我从Sqlite的答案中知道，默认情况下Sqlite不支持压缩。是否可以启用它，或者这需要另一个工具？情况如下：我需要在Sqlite数据库中添加数百万行。该表包含一个描述列，平均约500个字符，每个描述平均由40行共享，如下所示： id name othercolumn description 1 azefds ... This description will be the same for probably 40 rows 2 t

我从Sqlite的答案中知道，默认情况下Sqlite不支持压缩。是否可以启用它，或者这需要另一个工具？情况如下：

我需要在Sqlite数据库中添加数百万行。该表包含一个描述列，平均约500个字符，每个描述平均由40行共享，如下所示：

id    name    othercolumn    description 
1     azefds  ...            This description will be the same for probably 40 rows
2     tsdyug  ...            This description will be the same for probably 40 rows
...
40    wxcqds  ...            This description will be the same for probably 40 rows
41    azeyui  ...            This one is unique
42    uiuotr  ...            This one will be shared by 60 rows
43    poipud  ...            This one will be shared by 60 rows
...
101   iuotyp  ...            This one will be shared by 60 rows
102   blaxwx  ...            Same description for the next 10 rows
103   sdhfjk  ...            Same description for the next 10 rows
...

问题:

您能像这样插入行，并启用DB的压缩算法吗？赞成：您不必处理两个表，查询时更容易。或

你会用两张桌子吗

id    name    othercolumn    descriptionid
1     azefds  ...            1
2     tsdyug  ...            1    
...
40    wxcqds  ...            1
41    azeyui  ...            2
...

id    description
1     This description will be the same for probably 40 rows
2     This one is unique

缺点：解决方案1中的mytable不是简单的选择id、名称和描述，我们必须使用复杂的方法来检索它，涉及2个表，可能还有多个查询？或者，可能不需要复杂的查询，但可以使用带有union或merge或类似功能的智能查询

使用多个表不仅可以防止不一致性，占用更少的空间，而且还可以更快，即使涉及多个/更复杂的查询，因为它涉及移动更少的数据。你应该使用哪一个取决于这些特征中哪一个对你最重要

当您有两个表时，检索结果的查询看起来像这样，实际上只是两个表之间的连接：

select table1.id, table1.name, table1.othercolumn, table2.description
from table1, table2
where table1.descriptionid=table2.id

下面是一些Python示例代码，用于ScottHunter的回答：

import sqlite3

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute("CREATE TABLE mytable (id integer, name text, descriptionid integer)")
c.execute("CREATE TABLE descriptiontable (id integer, description text)")

c.execute('INSERT INTO mytable VALUES(1, "abcdef", 1)');
c.execute('INSERT INTO mytable VALUES(2, "ghijkl", 1)');
c.execute('INSERT INTO mytable VALUES(3, "iovxcd", 2)');
c.execute('INSERT INTO mytable VALUES(4, "zuirur", 1)');
c.execute('INSERT INTO descriptiontable VALUES(1, "Description1")');
c.execute('INSERT INTO descriptiontable VALUES(2, "Description2")');

c.execute('SELECT mytable.id, mytable.name, descriptiontable.description FROM mytable, descriptiontable WHERE mytable.descriptionid=descriptiontable.id');

print c.fetchall()

#[(1, u'abcdef', u'Description1'),
# (2, u'ghijkl', u'Description1'), 
# (3, u'iovxcd', u'Description2'), 
# (4, u'zuirur', u'Description1')]

谢谢你@ScottHunter。您认为如何使用解决方案2从mytable中获得等效的select id、名称和描述？有没有一种联合或合并之类的方法可以防止许多查询？我编辑了问题的结尾来展示这一点。@Basj创建一个视图怎么样？@DanMašek你能用视图发布一个答案吗？我不熟悉它。@Basj是您的脚本的修改版本，它使用一个视图。它基本上是一个虚拟表，隐藏了2表表示，让我们使用一个更简单的查询。你能把它作为答案@DanMašek发布吗？这样做将有助于今后的参考。