如何统计Ruby数组中的重复元素
我有一个sorting的数组:
[ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ]
我想得到这样的东西,但它不一定是一个哈希:
[ {:error => 'FATAL <error title="Request timed out.">', :count => 2}, {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1} ]
下面的代码打印你所要求的。 我会让你决定如何实际使用生成你正在寻找的哈希:
# sample array a=["aa","bb","cc","bb","bb","cc"] # make the hash default to 0 so that += will work correctly b = Hash.new(0) # iterate over the array, counting duplicate entries a.each do |v| b[v] += 1 end b.each do |k, v| puts "#{k} appears #{v} times" end
注:我只是注意到你说数组已经sorting。 上面的代码不需要sorting。 使用该属性可能会产生更快的代码。
你可以通过使用inject
非常简洁(一行):
a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient ...">'] b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h } b.to_a.each {|error,count| puts "#{count}: #{error}" }
会产生:
1: FATAL <error title="There is insufficient ..."> 2: FATAL <error title="Request timed out.">
如果你有这样的数组:
words = ["aa","bb","cc","bb","bb","cc"]
您需要统计重复的元素,一行解决scheme是:
result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }
我个人会这样做:
# myprogram.rb a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">'] puts a
然后运行该程序并将其传递给uniq -c:
ruby myprogram.rb | uniq -c
输出:
2 FATAL <error title="Request timed out."> 1 FATAL <error title="There is insufficient system memory to run this query.">
使用Enumerable#group_by来解决上述问题。
[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h # {1=>1, 2=>2, 3=>3, 4=>1}
将其分解成不同的方法调用:
a = [1, 2, 2, 3, 3, 3, 4] a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]} a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]] a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}
在Ruby 1.8.7中添加了Enumerable#group_by
#group_by。
a = [1,1,1,2,2,3] a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } } => [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]
以下情况如何:
things = [1, 2, 2, 3, 3, 3, 4] things.uniq.map{|t| [t,things.count(t)]}.to_h
这种感觉更清晰,更具描述性,我们正在尝试做什么。
我怀疑它对大集合也会比对每个值迭代的集合更好。
基准性能testing:
a = (1...1000000).map { rand(100)} user system total real inject 7.670000 0.010000 7.680000 ( 7.985289) array count 0.040000 0.000000 0.040000 ( 0.036650) each_with_object 0.210000 0.000000 0.210000 ( 0.214731) group_by 0.220000 0.000000 0.220000 ( 0.218581)
所以它比较快
简单的实现:
(errors_hash = {}).default = 0 array_of_errors.each { |error| errors_hash[error] += 1 }
这里是示例数组:
a=["aa","bb","cc","bb","bb","cc"]
- select所有的唯一键。
- 对于每个键,我们将它们累积成一个哈希来得到这样的东西:
{'bb' => ['bb', 'bb']}
res = a.uniq.inject({}){| accu,uni | accu.merge({uni => a.select {| i | i == uni}})} {“aa”=> [“aa”],“bb”=> [“bb”,“bb”,“bb”],“cc”=> [“cc”,“cc”]}
现在你可以做这样的事情了:
res['aa'].size