Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 字符串中最常见的单词_Ruby_Arrays_String_Hash - Fatal编程技术网

Ruby 字符串中最常见的单词

Ruby 字符串中最常见的单词,ruby,arrays,string,hash,Ruby,Arrays,String,Hash,我是Ruby新手,正在尝试编写一个方法来返回字符串中最常用单词的数组。如果有一个单词的计数较高,则应返回该单词。如果有两个字与高计数绑定,则应在数组中返回这两个字 问题是,当我通过第二个字符串时,代码只计算“单词”两次,而不是三次。当第三个字符串被传递时,它返回计数为2的“it”,这没有意义,因为“it”的计数应该为1 def most_common(string) counts = {} words = string.downcase.tr(",.?!",'').split(' ')

我是Ruby新手,正在尝试编写一个方法来返回字符串中最常用单词的数组。如果有一个单词的计数较高,则应返回该单词。如果有两个字与高计数绑定,则应在数组中返回这两个字

问题是,当我通过第二个字符串时,代码只计算“单词”两次,而不是三次。当第三个字符串被传递时,它返回计数为2的“it”,这没有意义,因为“it”的计数应该为1

def most_common(string)
  counts = {}
  words = string.downcase.tr(",.?!",'').split(' ')

  words.uniq.each do |word|
    counts[word] = 0
  end

  words.each do |word|
    counts[word] = string.scan(word).count
  end

  max_quantity = counts.values.max
  max_words = counts.select { |k, v| v == max_quantity }.keys
  puts max_words
end

most_common('a short list of words with some words') #['words']
most_common('Words in a short, short words, lists of words!') #['words']
most_common('a short list of words with some short words in it') #['words', 'short']

你计算单词实例的方法是你的问题
与一起位于
中,因此它是重复计算的

[1] pry(main)> 'with some words in it'.scan('it')
=> ["it", "it"]
不过,这样做更容易,您可以使用
each_with_object
调用,根据值的实例数对数组内容进行分组,如下所示:

counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
这将遍历数组中的每个条目,并将哈希中每个单词条目的值加1

因此,以下内容应该对您有用:

def most_common(string)
  words = string.downcase.tr(",.?!",'').split(' ')
  counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
  max_quantity = counts.values.max
  counts.select { |k, v| v == max_quantity }.keys
end

p most_common('a short list of words with some words') #['words']
p most_common('Words in a short, short words, lists of words!') #['words']
p most_common('a short list of words with some short words in it') #['words', 'short']

同样的事情也可以通过以下方式完成:

def most_common(string)
  counts = Hash.new 0
  string.downcase.tr(",.?!",'').split(' ').each{|word| counts[word] += 1}
  # For "Words in a short, short words, lists of words!"
  # counts ---> {"words"=>3, "in"=>1, "a"=>1, "short"=>2, "lists"=>1, "of"=>1} 
  max_value = counts.values.max
  #max_value ---> 3
  return counts.select{|key , value| value == counts.values.max}
  #returns --->  {"words"=>3}
end

这只是一个较短的解决方案,您可能希望使用它。希望有帮助:)这是程序员喜欢的问题,不是吗:)函数方法怎么样

# returns array of words after removing certain English punctuations
def english_words(str)
  str.downcase.delete(',.?!').split
end

# returns hash mapping element to count
def element_counts(ary)
  ary.group_by { |e| e }.inject({}) { |a, e| a.merge(e[0] => e[1].size) }
end

def most_common(ary)
  ary.empty? ? nil : 
    element_counts(ary)
      .group_by { |k, v| v }
      .sort
      .last[1]
      .map(&:first)
end

most_common(english_words('a short list of words with some short words in it'))
#=> ["short", "words"]

尼克已经回答了你的问题,我只想建议另一种方法。由于“highcount”是模糊的,我建议您返回一个包含小写单词及其各自计数的哈希。自Ruby 1.9以来,哈希保留了输入键值对的顺序,因此我们可能希望利用这一点并返回哈希,键值对按值的降序排列

代码

def words_by_count(str)
  str.gsub(/./) do |c|
    case c
    when /\w/ then c.downcase
    when /\s/ then c
    else ''
    end
  end.split
     .group_by {|w| w}
     .map {|k,v| [k,v.size]}
     .sort_by(&:last)
     .reverse
     .to_h
end
words_by_count('Words in a short, short words, lists of words!')
words_by_count('a short list of words with some words')
  #=> {"words"=>2, "of"=>1, "some"=>1, "with"=>1,
  #    "list"=>1, "short"=>1, "a"=>1}
words_by_count('Words in a short, short words, lists of words!')
  #=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
words_by_count('a short list of words with some short words in it')
  #=> {"words"=>2, "short"=>2, "it"=>1, "with"=>1,
  #    "some"=>1, "of"=>1, "list"=>1, "in"=>1, "a"=>1}
Ruby 2.1中介绍了该方法。对于较早的Ruby版本,必须使用:

Hash[str.gsub(/./)... .reverse]
示例

def words_by_count(str)
  str.gsub(/./) do |c|
    case c
    when /\w/ then c.downcase
    when /\s/ then c
    else ''
    end
  end.split
     .group_by {|w| w}
     .map {|k,v| [k,v.size]}
     .sort_by(&:last)
     .reverse
     .to_h
end
words_by_count('Words in a short, short words, lists of words!')
words_by_count('a short list of words with some words')
  #=> {"words"=>2, "of"=>1, "some"=>1, "with"=>1,
  #    "list"=>1, "short"=>1, "a"=>1}
words_by_count('Words in a short, short words, lists of words!')
  #=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
words_by_count('a short list of words with some short words in it')
  #=> {"words"=>2, "short"=>2, "it"=>1, "with"=>1,
  #    "some"=>1, "of"=>1, "list"=>1, "in"=>1, "a"=>1}
解释

下面是第二个示例中发生的情况,其中:

str = 'Words in a short, short words, lists of words!'
str.gsub(/./)do | c |……
匹配字符串中的每个字符,并将其发送到块以决定如何处理它。正如您所看到的,单词字符被降格,空白被单独保留,其他所有内容都转换为空白

s = str.gsub(/./) do |c|
      case c
      when /\w/ then c.downcase
      when /\s/ then c
      else ''
      end
    end
  #=> "words in a short short words lists of words"
然后是

a = s.split
 #=> ["words", "in", "a", "short", "short", "words", "lists", "of", "words"]
h = a.group_by {|w| w}
 #=> {"words"=>["words", "words", "words"], "in"=>["in"], "a"=>["a"],
 #    "short"=>["short", "short"], "lists"=>["lists"], "of"=>["of"]}
b = h.map {|k,v| [k,v.size]}
 #=> [["words", 3], ["in", 1], ["a", 1], ["short", 2], ["lists", 1], ["of", 1]]
c = b.sort_by(&:last)
 #=> [["of", 1], ["in", 1], ["a", 1], ["lists", 1], ["short", 2], ["words", 3]]
d = c.reverse
 #=> [["words", 3], ["short", 2], ["lists", 1], ["a", 1], ["in", 1], ["of", 1]]
d.to_h # or Hash[d]
 #=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
请注意,
c=b.sort_by(&:last)
d=c.reverse
可以替换为:

d = b.sort_by { |_,k| -k }
 #=> [["words", 3], ["short", 2], ["a", 1], ["in", 1], ["lists", 1], ["of", 1]]
排序
后接
反向
通常更快。

假设字符串是包含多个单词的字符串

words = string.split(/[.!?,\s]/)
words.sort_by{|x|words.count(x)}

在这里,我们将单词拆分为字符串,并将它们添加到数组中。然后根据字数对数组进行排序。最常见的单词将出现在结尾。

答案很好。我可以建议您对
最常用的
方法的前两行稍加改进吗
words=string.scan(/\w+/);counts=words。每个带有_对象的_(Hash.new(0)){| word,counts | counts[word.downcase]+=1}
确定!我只是想保留一些原作。始终有改进的余地。相关问题:谢谢大家的帮助。仔细检查后我发现,在words.each中,我看的是“string”,没有下套管,这似乎解决了我的两个问题。@NickVeys给出了一个很好的答案(赢得了我的+1),并且是唯一一个回答你问题的人,因此可以理解你会给它打绿色复选标记。然而,我建议,将来在选择答案之前,你要推迟一段时间(也许一个小时或更长),因为相对快速的选择往往会阻碍其他可能更好的答案,也会抢先那些仍在准备答案的读者。对这一切还是很陌生的,也不知道该怎么做。值得一读。
def firstRepeatedWord(string)
  h_data = Hash.new(0)
  string.split(" ").each{|x| h_data[x] +=1}
  h_data.key(h_data.values.max)
end