Ruby 按第一个公共字母分组的字符串数组

Ruby 按第一个公共字母分组的字符串数组,ruby,Ruby,是否存在将字符串数组中的第一个常用字母分组的方法 例如: array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ] 所以当我这么做的时候 array.group_by{ |string| some_logic_with_string } 结果应该是, { 'hello' => ['hello', 'hello you'], 'people' => ['people'], 'fin' =&

是否存在将字符串数组中的第一个常用字母分组的方法

例如:

 array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
所以当我这么做的时候

array.group_by{ |string| some_logic_with_string }
结果应该是,

{ 
   'hello' => ['hello', 'hello you'],
   'people' => ['people'],
   'fin' => ['finally', 'finland']
}

不确定,如果你能按所有常用字母排序。但是,如果您只想按第一个字母进行排序,则如下所示:

array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]    
result = {}
array.each { |st| result[st[0]] = result.fetch(st[0], []) + [st] }

pp result
{"h"=>["hello", "hello you"], "p"=>["people"], "f"=>["finally", "finland"]}

现在,
result
包含您想要的散列。

嗯,您正在尝试做一些非常定制的事情。我可以想出两种经典的方法来满足你的需求:1)和2)

使用词干分析,您可以找到较长单词的词根。这里有一个答案


Levenshtein是计算两个字符串之间差异的著名算法。由于本机C扩展,有一个for-it运行得非常快。

注意:有些测试用例不明确,期望值与其他测试冲突,您需要修复它们


我想普通的
groupby
可能不起作用,需要进一步处理

我提出了以下代码,这些代码似乎以一致的方式适用于所有给定的测试用例

我在代码中留下了注释来解释逻辑。完全理解它的唯一方法是检查
h
的值,并查看简单测试用例的流程

def group_by_common_chars(array)
    # We will iteratively group by as many time as there are characters
    # in a largest possible key, which is max length of all strings
    max_len = array.max_by {|i| i.size}.size

    # First group by first character.
    h = array.group_by{|i| i[0]}

    # Now iterate remaining (max_len - 1) times
    (1...max_len).each do |c|
        # Let's perform a group by next set of starting characters.
        t = h.map do |k,v|
            h1 = v.group_by {|i| i[0..c]} 
        end.reduce(&:merge)

        # We need to merge the previously generated hash
        # with the hash generated in this iteration.  Here things get tricky.
        # If previously, we had 
        #    {"a" => ["a"], "ab" => ["ab", "abc"]},
        # and now, we have 
        #    {"a"=>["a"], "ab"=>["ab"], "abc"=>["abc"]},
        # We need to merge the two hashes such that we have
        #    {"a"=>["a"], "ab"=>["ab", "abc"], "abc"=>["abc"]}.
        # Note that `Hash#merge`'s block is called only for common keys, so, "abc"
        # will get merged, we can't do much about it now.  We will process
        # it later in the loop    
        h = h.merge(t) do |k, o, n| 
            if (o.size != n.size)
                diff = [o,n].max - [o,n].min
                if diff.size == 1 && t.value?(diff)
                    [o,n].max
                else
                    [o,n].min
                end
            else
                o
            end
        end
    end

    # Sort by key length, smallest in the beginning.
    h = h.sort {|i,j| i.first.size <=> j.first.size }.to_h

    # Get rid of those key-value pairs, where value is single element array
    # and that single element is already part of another key-value pair, and
    # that value array has more than one element.  This step will allow us
    # to get rid of key-value like "abc"=>["abc"] in the example discussed
    # above.

    h = h.tap do |h|
        keys = h.keys
        keys.each do |k|
            v = h[k]    
            if (v.size == 1 && 
                h.key?(v.first) && 
                h.values.flatten.count(v.first) > 1) then
                h.delete(k)
            end
        end
    end

    # Get rid of those keys whose value array consist of only elements that
    # already part of some other key.  Since, hash is ordered by key's string 
    # size, this process allows us to get rid of those keys which are smaller 
    # in length but consists of only elements that are present somewhere else
    # with a key of larger length.  For example, it lets us to get rid of 
    # "a"=>["aba", "abb", "aaa", "aab"] from a hash like
    # {"a"=>["aba", "abb", "aaa", "aab"], "ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
    h.tap do |h|
        keys = h.keys
        keys.each do |k|
            values = h[k]
            other_values = h.values_at(*(h.keys-[k])).flatten
            already_present = values.all? do |v|
                other_values.include?(v)
            end
            h.delete(k) if already_present
        end
    end
end

你的逻辑不清楚。
array=[“a”、“ab”、“abc”]
的预期结果是什么?关于
[“aba”、“abb”、“aaa”、“aab”]
呢?@Drenmi显然情况并非如此。查看OP预期散列中的键。它们都有相同的长度吗?当数组是
[“为什么”,“没有”,“你”,“回答”,“上面”,“问题?”,“请”,“做”,“所以。”]
。对于
数组=['a',ab',abc']
,为什么它不是
{'a'=>['a',ab',abc']}
{'a'=>['a',ab'],'abc'=>['abc']}
,等等?那不是OP想要的。是的,我知道。我在第一行写的。
@Wand-Maker
真是太棒了。这就是我想要的。非常感谢。
p group_by_common_chars ['hello', 'hello you', 'people', 'finally', 'finland']
#=> {"fin"=>["finally", "finland"], "hello"=>["hello", "hello you"], "people"=>["people"]}

p group_by_common_chars ['a', 'ab', 'abc']
#=> {"a"=>["a"], "ab"=>["ab", "abc"]}

p group_by_common_chars  ['aba', 'abb', 'aaa', 'aab']
#=> {"ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}

p group_by_common_chars ["Why", "haven't", "you", "answered", "the", "above", "questions?", "Please", "do", "so."]
#=> {"a"=>["answered", "above"], "do"=>["do"], "Why"=>["Why"], "you"=>["you"], "so."=>["so."], "the"=>["the"], "Please"=>["Please"], "haven't"=>["haven't"], "questions?"=>["questions?"]}