python使用多个键填充搁置对象/字典_Python_Dictionary_N Gram_Shelve

python使用多个键填充搁置对象/字典

python dictionary

python使用多个键填充搁置对象/字典,python,dictionary,n-gram,shelve,Python,Dictionary,N Gram,Shelve,我有一个4克的列表，我想用它填充dictionary对象/shevle对象： ['I','go','to','work'] ['I','go','there','often'] ['it','is','nice','being'] ['I','live','in','NY'] ['I','go','to','work'] 这样我们就有了类似于： four_grams['I']['go']['to']['work']=1 任何新遇到的4-gram都会用它的四个键填充，值为1，如果再次遇到，它的

我有一个4克的列表，我想用它填充dictionary对象/shevle对象：

['I','go','to','work']
['I','go','there','often']
['it','is','nice','being']
['I','live','in','NY']
['I','go','to','work']

这样我们就有了类似于：

four_grams['I']['go']['to']['work']=1

任何新遇到的4-gram都会用它的四个键填充，值为1，如果再次遇到，它的值会递增

您只需创建一个助手方法，将元素一次插入一个嵌套字典，每次检查所需子字典是否已存在：

dict = {}
def insert(fourgram):
    d = dict    # reference
    for el in fourgram[0:-1]:       # elements 1-3 if fourgram has 4 elements
        if el not in d: d[el] = {}  # create new, empty dict
        d = d[el]                   # move into next level dict

    if fourgram[-1] in d: d[fourgram[-1]] += 1  # increment existing, or...
    else: d[fourgram[-1]] = 1                   # ...create as 1 first time

您可以使用数据集对其进行填充，如：

insert(['I','go','to','work'])
insert(['I','go','there','often'])
insert(['it','is','nice','being'])
insert(['I','live','in','NY'])
insert(['I','go','to','work'])

之后，您可以根据需要索引到

dict

：

print( dict['I']['go']['to']['work'] );     # prints 2
print( dict['I']['go']['there']['often'] ); # prints 1
print( dict['it']['is']['nice']['being'] ); # prints 1
print( dict['I']['live']['in']['NY'] );     # prints 1

你可以这样做：

import shelve

from collections import defaultdict

db = shelve.open('/tmp/db')

grams = [
    ['I','go','to','work'],
    ['I','go','there','often'],
    ['it','is','nice','being'],
    ['I','live','in','NY'],
    ['I','go','to','work'],
]

for gram in grams:
    path = db.get(gram[0], defaultdict(int))

    def f(path, word):
        if not word in path:
            path[word] = defaultdict(int)
        return path[word]
    reduce(f, gram[1:-1], path)[gram[-1]] += 1

    db[gram[0]] = path

print db

db.close()

重复使用这个工具架对象可以吗？而且它不适用于多个级别，只有两个。。。这当然很有用，但完全不同，请注意，如果删除重复标记，可以通过对

Shelve

对象的

\uuuu getitem\uuuu

子类化，在

KeyError

上添加

defaultdict

对象来轻松实现。您也可以使用4个长元组。这听起来很有趣，但我该怎么做？对于解决方案来说似乎也不错，但我如何使用它来更新搁置对象？它需要是搁置吗？您可以将字典pickle/转储到json中，然后自己将其保存到文件中吗？是的，因为我无法在每次运行此代码（我将其用于非常大的数据集）或从文件中写入和读取pickle文件，所以搁置（无写回）是一个非常好的解决方案，重点是如何使它与更新多个键一起工作（我认为使用一些临时变量是可能的，但仍然无法准确地找出如何做到这一点）好的，我已经更新了我的答案。我希望这足以让您开始。是的，您可以将

dict

初始化为

dict=shelve.open（'file'，writeback=True）

，这样就可以了。是的，writeback=True的问题是，如果数据集很大（这里就是这种情况），我们将遇到内存问题，所以我希望避免这种情况