Ruby 按第一个公共字母分组的字符串数组
是否存在将字符串数组中的第一个常用字母分组的方法 例如:Ruby 按第一个公共字母分组的字符串数组,ruby,Ruby,是否存在将字符串数组中的第一个常用字母分组的方法 例如: array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ] 所以当我这么做的时候 array.group_by{ |string| some_logic_with_string } 结果应该是, { 'hello' => ['hello', 'hello you'], 'people' => ['people'], 'fin' =&
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
所以当我这么做的时候
array.group_by{ |string| some_logic_with_string }
结果应该是,
{
'hello' => ['hello', 'hello you'],
'people' => ['people'],
'fin' => ['finally', 'finland']
}
不确定,如果你能按所有常用字母排序。但是,如果您只想按第一个字母进行排序,则如下所示:
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
result = {}
array.each { |st| result[st[0]] = result.fetch(st[0], []) + [st] }
pp result
{"h"=>["hello", "hello you"], "p"=>["people"], "f"=>["finally", "finland"]}
现在,
result
包含您想要的散列。嗯,您正在尝试做一些非常定制的事情。我可以想出两种经典的方法来满足你的需求:1)和2)
使用词干分析,您可以找到较长单词的词根。这里有一个答案
Levenshtein是计算两个字符串之间差异的著名算法。由于本机C扩展,有一个for-it运行得非常快。注意:有些测试用例不明确,期望值与其他测试冲突,您需要修复它们
我想普通的
groupby
可能不起作用,需要进一步处理
我提出了以下代码,这些代码似乎以一致的方式适用于所有给定的测试用例
我在代码中留下了注释来解释逻辑。完全理解它的唯一方法是检查h
的值,并查看简单测试用例的流程
def group_by_common_chars(array)
# We will iteratively group by as many time as there are characters
# in a largest possible key, which is max length of all strings
max_len = array.max_by {|i| i.size}.size
# First group by first character.
h = array.group_by{|i| i[0]}
# Now iterate remaining (max_len - 1) times
(1...max_len).each do |c|
# Let's perform a group by next set of starting characters.
t = h.map do |k,v|
h1 = v.group_by {|i| i[0..c]}
end.reduce(&:merge)
# We need to merge the previously generated hash
# with the hash generated in this iteration. Here things get tricky.
# If previously, we had
# {"a" => ["a"], "ab" => ["ab", "abc"]},
# and now, we have
# {"a"=>["a"], "ab"=>["ab"], "abc"=>["abc"]},
# We need to merge the two hashes such that we have
# {"a"=>["a"], "ab"=>["ab", "abc"], "abc"=>["abc"]}.
# Note that `Hash#merge`'s block is called only for common keys, so, "abc"
# will get merged, we can't do much about it now. We will process
# it later in the loop
h = h.merge(t) do |k, o, n|
if (o.size != n.size)
diff = [o,n].max - [o,n].min
if diff.size == 1 && t.value?(diff)
[o,n].max
else
[o,n].min
end
else
o
end
end
end
# Sort by key length, smallest in the beginning.
h = h.sort {|i,j| i.first.size <=> j.first.size }.to_h
# Get rid of those key-value pairs, where value is single element array
# and that single element is already part of another key-value pair, and
# that value array has more than one element. This step will allow us
# to get rid of key-value like "abc"=>["abc"] in the example discussed
# above.
h = h.tap do |h|
keys = h.keys
keys.each do |k|
v = h[k]
if (v.size == 1 &&
h.key?(v.first) &&
h.values.flatten.count(v.first) > 1) then
h.delete(k)
end
end
end
# Get rid of those keys whose value array consist of only elements that
# already part of some other key. Since, hash is ordered by key's string
# size, this process allows us to get rid of those keys which are smaller
# in length but consists of only elements that are present somewhere else
# with a key of larger length. For example, it lets us to get rid of
# "a"=>["aba", "abb", "aaa", "aab"] from a hash like
# {"a"=>["aba", "abb", "aaa", "aab"], "ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
h.tap do |h|
keys = h.keys
keys.each do |k|
values = h[k]
other_values = h.values_at(*(h.keys-[k])).flatten
already_present = values.all? do |v|
other_values.include?(v)
end
h.delete(k) if already_present
end
end
end
你的逻辑不清楚。
array=[“a”、“ab”、“abc”]
的预期结果是什么?关于[“aba”、“abb”、“aaa”、“aab”]
呢?@Drenmi显然情况并非如此。查看OP预期散列中的键。它们都有相同的长度吗?当数组是[“为什么”,“没有”,“你”,“回答”,“上面”,“问题?”,“请”,“做”,“所以。”]
。对于数组=['a',ab',abc']
,为什么它不是{'a'=>['a',ab',abc']}
或{'a'=>['a',ab'],'abc'=>['abc']}
,等等?那不是OP想要的。是的,我知道。我在第一行写的。@Wand-Maker
真是太棒了。这就是我想要的。非常感谢。
p group_by_common_chars ['hello', 'hello you', 'people', 'finally', 'finland']
#=> {"fin"=>["finally", "finland"], "hello"=>["hello", "hello you"], "people"=>["people"]}
p group_by_common_chars ['a', 'ab', 'abc']
#=> {"a"=>["a"], "ab"=>["ab", "abc"]}
p group_by_common_chars ['aba', 'abb', 'aaa', 'aab']
#=> {"ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
p group_by_common_chars ["Why", "haven't", "you", "answered", "the", "above", "questions?", "Please", "do", "so."]
#=> {"a"=>["answered", "above"], "do"=>["do"], "Why"=>["Why"], "you"=>["you"], "so."=>["so."], "the"=>["the"], "Please"=>["Please"], "haven't"=>["haven't"], "questions?"=>["questions?"]}