Java数组，查找重复

我有一个数组，并正在寻找重复。

duplicates = false; for(j = 0; j < zipcodeList.length; j++){ for(k = 0; k < zipcodeList.length; k++){ if (zipcodeList[k] == zipcodeList[j]){ duplicates = true; } } }

但是，这个代码在没有重复时不起作用。为什么？

在鼻子上的答案..

 duplicates=false; for (j=0;j<zipcodeList.length;j++) for (k=j+1;k<zipcodeList.length;k++) if (k!=j && zipcodeList[k] == zipcodeList[j]) duplicates=true;

编辑将.equals()切换回==因为我读了你正在使用int地方，这在最初的问题中是不清楚的。还要设置k=j+1 ，将执行时间减半，但仍然是O（n ² ）。

更快（以极限）的方式

这是一个基于散列的方法。你必须支付自动装箱，但它是O（n）而不是O（n ² ）。一个有进取心的人会去找一个原始的基于int的哈希集合（Apache或Google Collections有这样的东西，methinks）。

 boolean duplicates(final int[] zipcodelist) { Set<Integer> lump = new HashSet<Integer>(); for (int i : zipcodelist) { if (lump.contains(i)) return true; lump.add(i); } return false; }

向HuyLe鞠躬

看到HuyLe的答案是或多或少的O（n）解决方案，我认为这需要一些添加步骤：

 static boolean duplicates(final int[] zipcodelist) { final int MAXZIP = 99999; boolean[] bitmap = new boolean[MAXZIP+1]; java.util.Arrays.fill(bitmap, false); for (int item : zipcodeList) if (!bitmap[item]) bitmap[item] = true; else return true; } return false; }

或者只是为了紧凑

 static boolean duplicates(final int[] zipcodelist) { final int MAXZIP = 99999; boolean[] bitmap = new boolean[MAXZIP+1]; // Java guarantees init to false for (int item : zipcodeList) if (!(bitmap[item] ^= true)) return true; return false; }

有关系吗？

那么，所以我跑了一个基准，这是可以在各地，但这里的代码：

 import java.util.BitSet; class Yuk { static boolean duplicatesZero(final int[] zipcodelist) { boolean duplicates=false; for (int j=0;j<zipcodelist.length;j++) for (int k=j+1;k<zipcodelist.length;k++) if (k!=j && zipcodelist[k] == zipcodelist[j]) duplicates=true; return duplicates; } static boolean duplicatesOne(final int[] zipcodelist) { final int MAXZIP = 99999; boolean[] bitmap = new boolean[MAXZIP + 1]; java.util.Arrays.fill(bitmap, false); for (int item : zipcodelist) { if (!(bitmap[item] ^= true)) return true; } return false; } static boolean duplicatesTwo(final int[] zipcodelist) { final int MAXZIP = 99999; BitSet b = new BitSet(MAXZIP + 1); b.set(0, MAXZIP, false); for (int item : zipcodelist) { if (!b.get(item)) { b.set(item, true); } else return true; } return false; } enum ApproachT { NSQUARED, HASHSET, BITSET}; /** * @param args */ public static void main(String[] args) { ApproachT approach = ApproachT.BITSET; final int REPS = 100; final int MAXZIP = 99999; int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 }; long[][] times = new long[sizes.length][REPS]; boolean tossme = false; for (int sizei = 0; sizei < sizes.length; sizei++) { System.err.println("Trial for zipcodelist size= "+sizes[sizei]); for (int rep = 0; rep < REPS; rep++) { int[] zipcodelist = new int[sizes[sizei]]; for (int i = 0; i < zipcodelist.length; i++) { zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1)); } long begin = System.currentTimeMillis(); switch (approach) { case NSQUARED : tossme ^= (duplicatesZero(zipcodelist)); break; case HASHSET : tossme ^= (duplicatesOne(zipcodelist)); break; case BITSET : tossme ^= (duplicatesTwo(zipcodelist)); break; } long end = System.currentTimeMillis(); times[sizei][rep] = end - begin; } long avg = 0; for (int rep = 0; rep < REPS; rep++) { avg += times[sizei][rep]; } System.err.println("Size=" + sizes[sizei] + ", avg time = " + avg / (double)REPS + "ms"); } } }

用NSQUARED：

 Trial for size= 10 Size=10, avg time = 0.0ms Trial for size= 1000 Size=1000, avg time = 0.0ms Trial for size= 10000 Size=10000, avg time = 100.0ms Trial for size= 100000 Size=100000, avg time = 9923.3ms

用HashSet

 Trial for zipcodelist size= 10 Size=10, avg time = 0.16ms Trial for zipcodelist size= 1000 Size=1000, avg time = 0.15ms Trial for zipcodelist size= 10000 Size=10000, avg time = 0.0ms Trial for zipcodelist size= 100000 Size=100000, avg time = 0.16ms Trial for zipcodelist size= 1000000 Size=1000000, avg time = 0.0ms

与BitSet

 Trial for zipcodelist size= 10 Size=10, avg time = 0.0ms Trial for zipcodelist size= 1000 Size=1000, avg time = 0.0ms Trial for zipcodelist size= 10000 Size=10000, avg time = 0.0ms Trial for zipcodelist size= 100000 Size=100000, avg time = 0.0ms Trial for zipcodelist size= 1000000 Size=1000000, avg time = 0.0ms

BITSET赢得！

但是只有头发… .15ms是在currentTimeMillis()的错误之内，我的基准测试中有一些漏洞。请注意，对于超过100000的任何列表，只需返回true因为会有重复。事实上，如果这个列表是随机的，你可以返回真正的WHP列表。什么是道德？在极限中，最有效的实现是：

  return true;

而且你经常不会错的

让我们看看你的算法是如何工作的：

 an array of unique values: [1, 2, 3] check 1 == 1. yes, there is duplicate, assigning duplicate to true. check 1 == 2. no, doing nothing. check 1 == 3. no, doing nothing. check 2 == 1. no, doing nothing. check 2 == 2. yes, there is duplicate, assigning duplicate to true. check 2 == 3. no, doing nothing. check 3 == 1. no, doing nothing. check 3 == 2. no, doing nothing. check 3 == 3. yes, there is duplicate, assigning duplicate to true.

一个更好的算法：

 for (j=0;j<zipcodeList.length;j++) { for (k=j+1;k<zipcodeList.length;k++) { if (zipcodeList[k]==zipcodeList[j]){ // or use .equals() return true; } } } return false;

您可以使用位图来获得更大的阵列性能。

 java.util.Arrays.fill(bitmap, false); for (int item : zipcodeList) if (!bitmap[item]) bitmap[item] = true; else { duplicate = true; break; }

要检查重复项，您需要比较不同的对。

因为你正在比较数组的第一个元素，所以它发现有重复，即使没有。

初始化k = j + 1。你不会比较自己的元素，你也不会重复比较。例如，j = 0，k = 1和k = 0，j = 1比较同一组元素。这将删除k = 0，j = 1的比较。

不要使用==使用.equals 。

试试这个（IIRC，ZipCode需要为此工作实现Comparable 。

 boolean unique; Set<ZipCode> s = new TreeSet<ZipCode>(); for( ZipCode zc : zipcodelist ) unique||=s.add(zc); duplicates = !unique;

如何使用这种方法？

 HashSet<Integer> zipcodeSet = new HashSet<Integer>(Arrays.asList(zipcodeList)); duplicates = zipcodeSet.size()!=zipcodeList.length;

您也可以使用Set，它不允许在Java中使用重复项。

  for (String name : names) { if (set.add(name) == false) { // your duplicate element } }

使用add（）方法并检查返回值。如果add（）返回false，则意味着Set中不允许使用该元素，并且这是重复的。

Java数组，查找重复

在鼻子上的答案..

更快（以极限）的方式

向HuyLe鞠躬

或者只是为了紧凑

有关系吗？

BITSET赢得！

为什么numpy.any在大型数组上如此缓慢？

如何在两个sorting数组的并集中find第k个最小的元素？

我怎样才能在Python中创build一个数组/列表的字典？

PHP的：按数值的长度sorting数组？

帮助从arrays.xml文件获取string数组

在JavaScript中连接N个数组的最有效方法是什么？

使用jQuery来比较两个Javascript对象数组

按照PHP中的子数组值sorting数组

如何获得我想要在Python中的任何大小的空数组？

C＃中的数组如何部分实现IList <T>？