Ruby：如何查找出现次数最多的数组？

[1, 1, 1, 2, 3].mode => 1 ['cat', 'dog', 'snake', 'dog'].mode => dog

首先build立一个哈希映射数组中的每个值到其频率…

 arr = [1, 1, 1, 2, 3] freq = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h } #=> {1=>3, 2=>1, 3=>1}

…然后使用频率表查找频率最高的元素：

 arr.max_by { |v| freq[v] } #=> 1

虽然我崇拜grep解决scheme的优雅，并提醒（或教授）Enumerable中一种我忘记（或完全忽略）的方法，但它缓慢，缓慢，缓慢。我同意100％创buildArray#mode方法是一个好主意，但是 – 这是Ruby，我们不需要一个函数库作用于数组，我们可以创build一个mixin，将必要的函数添加到 Array类本身。

但是注入（Hash）的select使用了一种我们并不真正需要的方式：我们只是希望得到最高的值。

这两种解决scheme都不能解决多个值可能是模式的可能性。也许这不是问题所述（不能说）。我想我想知道是否有平局，无论如何，我认为我们可以在performance上有所提高。

 require 'benchmark' class Array def mode1 sort_by {|i| grep(i).length }.last end def mode2 freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h } sort_by { |v| freq[v] }.last end def mode3 freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h } max = freq.values.max # we're only interested in the key(s) with the highest frequency freq.select { |k, f| f == max } # extract the keys that have the max frequency end end arr = Array.new(1_000) { |i| rand(100) } # something to test with Benchmark.bm(30) do |r| res = {} (1..3).each do |i| m = "mode#{i}" r.report(m) do 100.times do res[m] = arr.send(m).inspect end end end res.each { |k, v| puts "%10s = %s" % [k, v] } end

这里是一个示例运行的输出。

  user system total real mode1 34.375000 0.000000 34.375000 ( 34.393000) mode2 0.359000 0.000000 0.359000 ( 0.359000) mode3 0.219000 0.000000 0.219000 ( 0.219000) mode1 = 41 mode2 = 41 mode3 = [[41, 17], [80, 17], [72, 17]]

“优化”模式3占了前一个logging的60％。还要注意多个最高频率的条目。

编辑

几个月后，我注意到尼莱什的回答，提供了这个答案：

 def mode4 group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0] end

它不适用于1.8.6开箱即用，因为该版本没有Array＃group_by。对于Rails开发人员来说，ActiveSupport具有这一点，虽然看起来比上面的模式3慢2-3％左右。然而，使用（优秀的）后援gem产生了10-12％的收益，同时也提供了一堆1.8.7和1.9的额外费用。

以上仅适用于1.8.6 – 主要仅在Windows上安装。因为我已经安装了，所以下面是从IronRuby 1.0（在.NET 4.0上）得到的结果：

 ========================== IronRuby ===================================== (iterations bumped to **1000**) user system total real mode1 (I didn't bother :-)) mode2 4.265625 0.046875 4.312500 ( 4.203151) mode3 0.828125 0.000000 0.828125 ( 0.781255) mode4 1.203125 0.000000 1.203125 ( 1.062507)

因此，如果性能超级关键，请在您的Ruby版本和操作系统上进行基准testing。 YMMV 。

迈克：我find了一个更快的方法。尝试这个：

  class Array def mode4 group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0] end end

基准输出：

  user system total real mode1 24.340000 0.070000 24.410000 ( 24.526991) mode2 0.200000 0.000000 0.200000 ( 0.195348) mode3 0.120000 0.000000 0.120000 ( 0.118200) mode4 0.050000 0.010000 0.060000 ( 0.056315) mode1 = 76 mode2 = 76 mode3 = [[76, 18]] mode4 = 76

 array.max_by { |i| array.count(i) }

 arr = [ 1, 3, 44, 3 ] most_frequent_item = arr.uniq.max_by{ |i| arr.count( i ) } puts most_frequent_item #=> 3

甚至不需要考虑频率映射。

这是这个问题的重复： Ruby – 数组中的唯一元素

这是这个问题的解决scheme：

 group_by { |n| n }.values.max_by(&:size).first

这个版本似乎比Nilesh C的答案还要快。这是我用来testing的代码（OS X 10.6 Core 2 2.4GHz MB）。

对Mike Woodhouse的（原始）基准代码的荣誉：

 class Array def mode1 group_by { |n| n }.values.max_by(&:size).first end def mode2 freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h } max = freq.values.max # we're only interested in the key(s) with the highest frequency freq.select { |k, f| f == max } # extract the keys that have the max frequency end end arr = Array.new(1_0000) { |i| rand(100000) } # something to test with Benchmark.bm(30) do |r| (1..2).each do |i| r.report("mode#{i}") { 100.times do arr.send("mode#{i}").inspect; end }; end end

这里是基准的结果：

  user system total real mode1 1.830000 0.010000 1.840000 ( 1.876642) mode2 2.280000 0.010000 2.290000 ( 2.382117) mode1 = 70099 mode2 = [[70099, 3], [70102, 3], [51694, 3], [49685, 3], [38410, 3], [90815, 3], [30551, 3], [34720, 3], [58373, 3]]

正如你所看到的，这个版本比无视关系的警告快了20％。我也喜欢简洁，我个人亲自使用它没有猴子修补所有的地方。 🙂

如果你试图避免学习#inject（你不应该这样做）

 words = ['cat', 'dog', 'snake', 'dog'] count = Hash.new(0) words.each {|word| count[word] += 1} count.sort_by { |k,v| v }.last

但是如果我以前读过这个答案，现在我对#inject和man一无所知，你需要知道#inject。

 idx = {} [2,2,1,3,1].each { |i| idx.include?(i) ? idx[i] += 1 : idx[i] = 1}

这只是一个简单的索引器。你可以用任何基于符号/string的标识符replace[2,2,1 ..]数组，这不适用于对象，你需要引入更多的复杂性，但这很简单。

重读你的问题，这个解决scheme有点过分的devise，因为它会返回给你一个所有事件的索引，而不是最多的索引。

这是另一个版本，可以给你的关系作为一种模式应该：

 def mode group_by {|x| x}.group_by {|k,v| v.size}.sort.last.last.map(&:first) end

换句话说，对这些值进行分组，然后将这些kv对与值的数量进行分组，然后对这些 kv对进行sorting，取最后一个（最高）大小组，然后展开其值。我喜欢group_by 。

 def mode(array) count = [] # Number of times element is repeated in array output = [] array.compact! unique = array.uniq j=0 unique.each do |i| count[j] = array.count(i) j+=1 end k=0 count.each do |i| output[k] = unique[k] if i == count.max k+=1 end return output.compact.inspect end p mode([3,3,4,5]) #=> [3] p mode([1,2,3]) #=> [1,2,3] p mode([0,0,0,0,0,1,2,3,3,3,3,3]) #=> [0,3] p mode([-1,-1,nil,nil,nil,0]) #=> [-1] p mode([-2,-2,3,4,5,6,7,8,9,10,1000]) #=> [-2]

Ruby：如何查找出现次数最多的数组？

在Ruby中，获取数组中最大值索引的最简单方法是什么？

如何testing数组中的所有项目是否相同？

为什么Java中的System.arraycopy是原生的？

我如何从Bash中的数组获得唯一值？

PHP使用array_push将元素添加到multidimensional array中

数组哈希ruby

ArrayBuffer以base64编码的string

打印ArrayList

在Ruby中合并和交错两个数组

replace大于某个值的Python NumPy数组的所有元素