如何统计Ruby数组中的重复元素

我有一个sorting的数组：

[ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ]

我想得到这样的东西，但它不一定是一个哈希：

 [ {:error => 'FATAL <error title="Request timed out.">', :count => 2}, {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1} ]

下面的代码打印你所要求的。我会让你决定如何实际使用生成你正在寻找的哈希：

 # sample array a=["aa","bb","cc","bb","bb","cc"] # make the hash default to 0 so that += will work correctly b = Hash.new(0) # iterate over the array, counting duplicate entries a.each do |v| b[v] += 1 end b.each do |k, v| puts "#{k} appears #{v} times" end

注：我只是注意到你说数组已经sorting。上面的代码不需要sorting。使用该属性可能会产生更快的代码。

你可以通过使用inject非常简洁（一行）：

 a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient ...">'] b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h } b.to_a.each {|error,count| puts "#{count}: #{error}" }

会产生：

 1: FATAL <error title="There is insufficient ..."> 2: FATAL <error title="Request timed out.">

如果你有这样的数组：

 words = ["aa","bb","cc","bb","bb","cc"]

您需要统计重复的元素，一行解决scheme是：

 result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }

我个人会这样做：

 # myprogram.rb a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">'] puts a

然后运行该程序并将其传递给uniq -c：

 ruby myprogram.rb | uniq -c

输出：

  2 FATAL <error title="Request timed out."> 1 FATAL <error title="There is insufficient system memory to run this query.">

使用Enumerable＃group_by来解决上述问题。

 [1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

将其分解成不同的方法调用：

 a = [1, 2, 2, 3, 3, 3, 4] a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]} a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]] a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

在Ruby 1.8.7中添加了Enumerable#group_by ＃group_by。

 a = [1,1,1,2,2,3] a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } } => [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]

以下情况如何：

 things = [1, 2, 2, 3, 3, 3, 4] things.uniq.map{|t| [t,things.count(t)]}.to_h

这种感觉更清晰，更具描述性，我们正在尝试做什么。

我怀疑它对大集合也会比对每个值迭代的集合更好。

基准性能testing：

 a = (1...1000000).map { rand(100)} user system total real inject 7.670000 0.010000 7.680000 ( 7.985289) array count 0.040000 0.000000 0.040000 ( 0.036650) each_with_object 0.210000 0.000000 0.210000 ( 0.214731) group_by 0.220000 0.000000 0.220000 ( 0.218581)

所以它比较快

简单的实现：

 (errors_hash = {}).default = 0 array_of_errors.each { |error| errors_hash[error] += 1 }

这里是示例数组：

 a=["aa","bb","cc","bb","bb","cc"]

select所有的唯一键。
对于每个键，我们将它们累积成一个哈希来得到这样的东西： {'bb' => ['bb', 'bb']}

     res = a.uniq.inject（{}）{| accu，uni |  accu.merge（{uni => a.select {| i | i == uni}}）}
     {“aa”=> [“aa”]，“bb”=> [“bb”，“bb”，“bb”]，“cc”=> [“cc”，“cc”]}

现在你可以做这样的事情了：

 res['aa'].size

如何统计Ruby数组中的重复元素

Ruby数组find_first对象？

使用{}和（）（花括号和圆括号）访问单元格元素之间的区别

如何在Ruby中分块数组

Ruby中的数组和哈希性能

如何在Ruby中获取数组的最后一个元素？

我如何从数组中随机select？

Ruby数组限制方法

如何在哈希中通过散列值在散列数组中进行search？

Ruby：如何查找出现次数最多的数组？

Ruby：如何查找并返回数组中的重复值？