python使用多个键填充搁置对象/字典
我有一个4克的列表,我想用它填充dictionary对象/shevle对象:python使用多个键填充搁置对象/字典,python,dictionary,n-gram,shelve,Python,Dictionary,N Gram,Shelve,我有一个4克的列表,我想用它填充dictionary对象/shevle对象: ['I','go','to','work'] ['I','go','there','often'] ['it','is','nice','being'] ['I','live','in','NY'] ['I','go','to','work'] 这样我们就有了类似于: four_grams['I']['go']['to']['work']=1 任何新遇到的4-gram都会用它的四个键填充,值为1,如果再次遇到,它的
['I','go','to','work']
['I','go','there','often']
['it','is','nice','being']
['I','live','in','NY']
['I','go','to','work']
这样我们就有了类似于:
four_grams['I']['go']['to']['work']=1
任何新遇到的4-gram都会用它的四个键填充,值为1,如果再次遇到,它的值会递增 您只需创建一个助手方法,将元素一次插入一个嵌套字典,每次检查所需子字典是否已存在:
dict = {}
def insert(fourgram):
d = dict # reference
for el in fourgram[0:-1]: # elements 1-3 if fourgram has 4 elements
if el not in d: d[el] = {} # create new, empty dict
d = d[el] # move into next level dict
if fourgram[-1] in d: d[fourgram[-1]] += 1 # increment existing, or...
else: d[fourgram[-1]] = 1 # ...create as 1 first time
您可以使用数据集对其进行填充,如:
insert(['I','go','to','work'])
insert(['I','go','there','often'])
insert(['it','is','nice','being'])
insert(['I','live','in','NY'])
insert(['I','go','to','work'])
之后,您可以根据需要索引到dict
:
print( dict['I']['go']['to']['work'] ); # prints 2
print( dict['I']['go']['there']['often'] ); # prints 1
print( dict['it']['is']['nice']['being'] ); # prints 1
print( dict['I']['live']['in']['NY'] ); # prints 1
你可以这样做:
import shelve
from collections import defaultdict
db = shelve.open('/tmp/db')
grams = [
['I','go','to','work'],
['I','go','there','often'],
['it','is','nice','being'],
['I','live','in','NY'],
['I','go','to','work'],
]
for gram in grams:
path = db.get(gram[0], defaultdict(int))
def f(path, word):
if not word in path:
path[word] = defaultdict(int)
return path[word]
reduce(f, gram[1:-1], path)[gram[-1]] += 1
db[gram[0]] = path
print db
db.close()
重复使用这个工具架对象可以吗?而且它不适用于多个级别,只有两个。。。这当然很有用,但完全不同,请注意,如果删除重复标记,可以通过对
Shelve
对象的\uuuu getitem\uuuu
子类化,在KeyError
上添加defaultdict
对象来轻松实现。您也可以使用4个长元组。这听起来很有趣,但我该怎么做?对于解决方案来说似乎也不错,但我如何使用它来更新搁置对象?它需要是搁置吗?您可以将字典pickle/转储到json中,然后自己将其保存到文件中吗?是的,因为我无法在每次运行此代码(我将其用于非常大的数据集)或从文件中写入和读取pickle文件,所以搁置(无写回)是一个非常好的解决方案,重点是如何使它与更新多个键一起工作(我认为使用一些临时变量是可能的,但仍然无法准确地找出如何做到这一点)好的,我已经更新了我的答案。我希望这足以让您开始。是的,您可以将dict
初始化为dict=shelve.open('file',writeback=True)
,这样就可以了。是的,writeback=True的问题是,如果数据集很大(这里就是这种情况),我们将遇到内存问题,所以我希望避免这种情况