Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jsf-2/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Dictionary 在单个循环中对Dict{Tuple、Dict{String、Int64}中的内部字典进行并发求和_Dictionary_Nested_Tuples_Julia - Fatal编程技术网

Dictionary 在单个循环中对Dict{Tuple、Dict{String、Int64}中的内部字典进行并发求和

Dictionary 在单个循环中对Dict{Tuple、Dict{String、Int64}中的内部字典进行并发求和,dictionary,nested,tuples,julia,Dictionary,Nested,Tuples,Julia,给定对文本中的单词进行计数的countmap对象: vocab_counter = countmap(split("the lazy fox jumps over the brown dog")) [out]: Dict{SubString{String},Int64} with 7 entries: "brown" => 1 "lazy" => 1 "jumps" => 1 "the" => 2 "fox" => 1 "over

给定对文本中的单词进行计数的countmap对象:

vocab_counter = countmap(split("the lazy fox jumps over the brown dog"))
[out]:

Dict{SubString{String},Int64} with 7 entries:
  "brown" => 1
  "lazy"  => 1
  "jumps" => 1
  "the"   => 2
  "fox"   => 1
  "over"  => 1
  "dog"   => 1
(Dict{Tuple,Dict}(Pair{Tuple,Dict}(('b','r'),Dict("brown"=>1)),Pair{Tuple,Dict}(('t','h'),Dict("the"=>2)),Pair{Tuple,Dict}(('o','w'),Dict("brown"=>1)),Pair{Tuple,Dict}(('z','y'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('o','g'),Dict("dog"=>1)),Pair{Tuple,Dict}(('u','m'),Dict("jumps"=>1)),Pair{Tuple,Dict}(('o','x'),Dict("fox"=>1)),Pair{Tuple,Dict}(('e','r'),Dict("over"=>1)),Pair{Tuple,Dict}(('a','z'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('p','s'),Dict("jumps"=>1))…),Dict{Tuple,Int64}(Pair{Tuple,Int64}(('b','r'),1),Pair{Tuple,Int64}(('t','h'),1),Pair{Tuple,Int64}(('o','w'),1),Pair{Tuple,Int64}(('z','y'),1),Pair{Tuple,Int64}(('o','g'),1),Pair{Tuple,Int64}(('u','m'),1),Pair{Tuple,Int64}(('o','x'),1),Pair{Tuple,Int64}(('e','r'),1),Pair{Tuple,Int64}(('a','z'),1),Pair{Tuple,Int64}(('p','s'),1)…))
要获得字符二元计数器,每个字:

ngram_word_counter = Dict{Tuple,Dict}()
for (word, count) in vocab_counter
    for ng in ngrams(word, n) # bigrams.
        if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
            ngram_word_counter[ng] = Dict{String,Int64}()
            ngram_word_counter[ng][word] = 0
        end
        ngram_word_counter[ng][word] += count
    end
end
[内存字计数器]:

Dict{Tuple,Int64} with 20 entries:
  ('b','r') => 1
  ('t','h') => 2
  ('o','w') => 1
  ('z','y') => 1
  ('o','g') => 1
  ('u','m') => 1
  ('o','x') => 1
  ('e','r') => 1
  ('a','z') => 1
  ('p','s') => 1
  ('h','e') => 2
  ('d','o') => 1
  ('w','n') => 1
  ('m','p') => 1
  ('l','a') => 1
  ('o','v') => 1
  ('v','e') => 1
  ('r','o') => 1
  ('f','o') => 1
  ('j','u') => 1
对于Dict{Tuple,Dict{String,Int64}}对象,我需要重新循环ngram_word_计数器以获得没有单词的ngram_计数器,即Dict{Tuple,Int64}:

[ngram_计数器]:

Dict{Tuple,Int64} with 20 entries:
  ('b','r') => 1
  ('t','h') => 2
  ('o','w') => 1
  ('z','y') => 1
  ('o','g') => 1
  ('u','m') => 1
  ('o','x') => 1
  ('e','r') => 1
  ('a','z') => 1
  ('p','s') => 1
  ('h','e') => 2
  ('d','o') => 1
  ('w','n') => 1
  ('m','p') => 1
  ('l','a') => 1
  ('o','v') => 1
  ('v','e') => 1
  ('r','o') => 1
  ('f','o') => 1
  ('j','u') => 1
目前,为了获得这两个对象,我可以通过以下方式进行特别的第二次计数:

function compute_statistics(vocab_counter, n)
    ngram_word_counter = Dict{Tuple,Dict}()
    for (word, count) in vocab_counter
        for ng in ngrams(word, n) # bigrams.
            if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
                ngram_word_counter[ng] = Dict{String,Int64}()
                ngram_word_counter[ng][word] = 0
            end
            ngram_word_counter[ng][word] += count
        end
    end
    ngram_counter = Dict{Tuple,Int64}()
    for ng in keys(ngram_word_counter)
        ngram_counter[ng] = sum(values(ngram_word_counter[ng]))
    end
    return ngram_word_counter, ngram_counter
end
或者在第一个循环中同时更新ngram_word_计数器和ngram_计数器:

function compute_statistics(vocab_counter, n)
    ngram_word_counter = Dict{Tuple,Dict}()
    ngram_counter = Dict{Tuple,Int64}()
    for (word, count) in vocab_counter
        for ng in ngrams(word, n) # bigrams.
            if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
                ngram_word_counter[ng] = Dict{String,Int64}()
                ngram_word_counter[ng][word] = 0
            end
            ngram_word_counter[ng][word] += count
            ngram_counter[ng] += 1
        end
    end
    return ngram_word_counter, ngram_counter
end

ngram_word_counter, ngram_counter
但我在更新ngram_计数器时遇到一个关键错误:

我添加了一个额外的检查,它起了作用:

function compute_statistics(vocab_counter, n)
    ngram_word_counter = Dict{Tuple,Dict}()
    ngram_counter = Dict{Tuple,Int64}()
    for (word, count) in vocab_counter
        for ng in ngrams(word, n) # bigrams.
            if ! haskey(ngram_word_counter, ng) || ! haskey(ngram_word_counter[ng], word)
                ngram_word_counter[ng] = Dict{String,Int64}()
                ngram_word_counter[ng][word] = 0
            end
            if !haskey(ngram_counter, ng)
                ngram_counter[ng] = 0
            end
            ngram_word_counter[ng][word] += count
            ngram_counter[ng] += 1
        end
    end
    return ngram_word_counter, ngram_counter
end

ngram_word_counter, ngram_counter
[out]:

Dict{SubString{String},Int64} with 7 entries:
  "brown" => 1
  "lazy"  => 1
  "jumps" => 1
  "the"   => 2
  "fox"   => 1
  "over"  => 1
  "dog"   => 1
(Dict{Tuple,Dict}(Pair{Tuple,Dict}(('b','r'),Dict("brown"=>1)),Pair{Tuple,Dict}(('t','h'),Dict("the"=>2)),Pair{Tuple,Dict}(('o','w'),Dict("brown"=>1)),Pair{Tuple,Dict}(('z','y'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('o','g'),Dict("dog"=>1)),Pair{Tuple,Dict}(('u','m'),Dict("jumps"=>1)),Pair{Tuple,Dict}(('o','x'),Dict("fox"=>1)),Pair{Tuple,Dict}(('e','r'),Dict("over"=>1)),Pair{Tuple,Dict}(('a','z'),Dict("lazy"=>1)),Pair{Tuple,Dict}(('p','s'),Dict("jumps"=>1))…),Dict{Tuple,Int64}(Pair{Tuple,Int64}(('b','r'),1),Pair{Tuple,Int64}(('t','h'),1),Pair{Tuple,Int64}(('o','w'),1),Pair{Tuple,Int64}(('z','y'),1),Pair{Tuple,Int64}(('o','g'),1),Pair{Tuple,Int64}(('u','m'),1),Pair{Tuple,Int64}(('o','x'),1),Pair{Tuple,Int64}(('e','r'),1),Pair{Tuple,Int64}(('a','z'),1),Pair{Tuple,Int64}(('p','s'),1)…))

有没有一种方法可以在单个循环中同时对Dict{Tuple,Dict{String,Int64}}中的内部字典求和?

不确定这是否回答了问题,但您可以按如下方式使compute_统计信息更清晰:

function compute_statistics(vocab_counter, n)
    ngram_word_counter = Dict{Tuple,Dict{String,Int}}()
    ngram_counter = Dict{Tuple,Int}()
    for (word, count) in vocab_counter, ng in ngrams(word,n)
        ngram_word_counter[ng] = get(ngram_word_counter,ng,Dict{String,Int}())
        ngram_word_counter[ng][word] = get(ngram_word_counter[ng],word,0)+count
        ngram_counter[ng] = get(ngram_counter,ng,0)+count
    end
    return ngram_word_counter, ngram_counter
end
这使用get来避免haskey和shorter来表示语法

从ngram_word_计数器计算ngram_计数器的另一种方法是:


不确定这是否回答了问题,但您可以按如下方式使compute_统计更清晰:

function compute_statistics(vocab_counter, n)
    ngram_word_counter = Dict{Tuple,Dict{String,Int}}()
    ngram_counter = Dict{Tuple,Int}()
    for (word, count) in vocab_counter, ng in ngrams(word,n)
        ngram_word_counter[ng] = get(ngram_word_counter,ng,Dict{String,Int}())
        ngram_word_counter[ng][word] = get(ngram_word_counter[ng],word,0)+count
        ngram_counter[ng] = get(ngram_counter,ng,0)+count
    end
    return ngram_word_counter, ngram_counter
end
这使用get来避免haskey和shorter来表示语法

从ngram_word_计数器计算ngram_计数器的另一种方法是:

哎呀。把getkey和get搞混了,但现在已经修好了。将getkey与get混淆,但现在已修复
ngram_counter = Dict(k=>sum(values(d)) for (k,d) in ngram_word_counter)