我在哪里可以find一个标准的基于Trie的Java地图实现?
我有一个Java程序,它存储了很多从Strings到各种对象的映射。
现在,我的select是要依靠哈希(通过HashMap)或二进制search(通过TreeMap)。 我想知道在stream行和高质量的collections库中是否有一个高效和标准的基于树的地图实现?
我以前写过自己的,但是如果可以的话,我宁愿select标准的东西。
快速澄清:虽然我的问题是一般的,在当前的项目中,我正在处理大量的数据是由完全合格的类名或方法签名索引。 因此,有许多共享前缀。
你可能想看看Limewire为 Google Guava 贡献的Trie实现 。
核心Java库中没有trie数据结构。
这可能是因为尝试通常被devise为存储string,而Java数据结构更通用,通常包含任何Object
(定义相等和散列操作),尽pipe它们有时仅限于Comparable
对象(定义顺序)。 虽然CharSequence
适用于string,但对于“一系列符号”没有共同的抽象,我想你可以用Iterable
做其他types的符号。
还有一点需要考虑:当试图在Java中实现传统的trie时,您很快就会遇到Java支持Unicode的事实。 为了具有任何空间效率,您必须将string限制在符号的某个子集中,或者放弃将子节点存储在由符号索引的数组中的传统方法。 这可能是另一个原因,为什么尝试不被认为是包含在核心库中的通用目标,还有一些需要注意的地方,如果你自己实现或使用第三方库。
同时检查并发树 。 它们支持Radix和后缀树,专为高并发环境而devise。
我在这里写了一个简单而快速的实现。
Apache Commons Collections v4.0现在支持trie结构。
有关更多信息,请参阅org.apache.commons.collections4.trie
包信息。 特别要检查PatriciaTrie
类:
PATRICIA Trie(用字母数字编码检索信息的实用algorithm)的实现。
帕特里夏·特里是一个压缩的特里。 PATRICIA将数据存储在每个节点中,而不是存储Trie边缘的所有数据(并且具有空的内部节点)。 这允许非常有效的遍历,插入,删除,前置,后继,前缀,范围和select(对象)操作。 所有的操作在O(K)时间最差的情况下执行,其中K是树中最大项中的比特数。 在实践中,操作实际上需要O(A(K))时间,其中A(K)是树中所有项的平均比特数。
你想要的是org.apache.commons.collections.FastTreeMap
,我想。
Apache的公共集合: org.apache.commons.collections4.trie.PatriciaTrie
你也可以看看这个TopCoder (需要注册…)。
如果你需要sorting的地图,那么尝试是值得的。 如果你不这样hashmap是更好的。 使用string键的HashMap可以在标准的Java实现中得到改进: Array哈希映射
如果您不担心拉入Scala库,那么可以使用我写的突发线索库这个空间高效的实现。
下面是一个Trie的基本HashMap实现。 有些人可能会觉得这有用…
class Trie { HashMap<Character, HashMap> root; public Trie() { root = new HashMap<Character, HashMap>(); } public void addWord(String word) { HashMap<Character, HashMap> node = root; for (int i = 0; i < word.length(); i++) { Character currentLetter = word.charAt(i); if (node.containsKey(currentLetter) == false) { node.put(currentLetter, new HashMap<Character, HashMap>()); } node = node.get(currentLetter); } } public boolean containsPrefix(String word) { HashMap<Character, HashMap> node = root; for (int i = 0; i < word.length(); i++) { Character currentLetter = word.charAt(i); if (node.containsKey(currentLetter)) { node = node.get(currentLetter); } else { return false; } } return true; } }
这里是我的实现,通过: GitHub – MyTrie.java享受它
/* usage: MyTrie trie = new MyTrie(); trie.insert("abcde"); trie.insert("abc"); trie.insert("sadas"); trie.insert("abc"); trie.insert("wqwqd"); System.out.println(trie.contains("abc")); System.out.println(trie.contains("abcd")); System.out.println(trie.contains("abcdefg")); System.out.println(trie.contains("ab")); System.out.println(trie.getWordCount("abc")); System.out.println(trie.getAllDistinctWords()); */ import java.util.*; public class MyTrie { private class Node { public int[] next = new int[26]; public int wordCount; public Node() { for(int i=0;i<26;i++) { next[i] = NULL; } wordCount = 0; } } private int curr; private Node[] nodes; private List<String> allDistinctWords; public final static int NULL = -1; public MyTrie() { nodes = new Node[100000]; nodes[0] = new Node(); curr = 1; } private int getIndex(char c) { return (int)(c - 'a'); } private void depthSearchWord(int x, String currWord) { for(int i=0;i<26;i++) { int p = nodes[x].next[i]; if(p != NULL) { String word = currWord + (char)(i + 'a'); if(nodes[p].wordCount > 0) { allDistinctWords.add(word); } depthSearchWord(p, word); } } } public List<String> getAllDistinctWords() { allDistinctWords = new ArrayList<String>(); depthSearchWord(0, ""); return allDistinctWords; } public int getWordCount(String str) { int len = str.length(); int p = 0; for(int i=0;i<len;i++) { int j = getIndex(str.charAt(i)); if(nodes[p].next[j] == NULL) { return 0; } p = nodes[p].next[j]; } return nodes[p].wordCount; } public boolean contains(String str) { int len = str.length(); int p = 0; for(int i=0;i<len;i++) { int j = getIndex(str.charAt(i)); if(nodes[p].next[j] == NULL) { return false; } p = nodes[p].next[j]; } return nodes[p].wordCount > 0; } public void insert(String str) { int len = str.length(); int p = 0; for(int i=0;i<len;i++) { int j = getIndex(str.charAt(i)); if(nodes[p].next[j] == NULL) { nodes[curr] = new Node(); nodes[p].next[j] = curr; curr++; } p = nodes[p].next[j]; } nodes[p].wordCount++; } }
我刚刚尝试过我自己的并发TRIE实现,但不基于字符,它基于HashCode。 仍然我们可以使用这个具有Map的地图的每个CHAR hascode。
你可以使用代码@ https://github.com/skanagavelu/TrieMapPerformanceTest.java https://github.com/skanagavelu/TrieHashMap/blob/master/src/TrieMapValidationTest.java
import java.util.concurrent.atomic.AtomicReferenceArray; public class TrieMap { public static int SIZEOFEDGE = 4; public static int OSIZE = 5000; } abstract class Node { public Node getLink(String key, int hash, int level){ throw new UnsupportedOperationException(); } public Node createLink(int hash, int level, String key, String val) { throw new UnsupportedOperationException(); } public Node removeLink(String key, int hash, int level){ throw new UnsupportedOperationException(); } } class Vertex extends Node { String key; volatile String val; volatile Vertex next; public Vertex(String key, String val) { this.key = key; this.val = val; } @Override public boolean equals(Object obj) { Vertex v = (Vertex) obj; return this.key.equals(v.key); } @Override public int hashCode() { return key.hashCode(); } @Override public String toString() { return key +"@"+key.hashCode(); } } class Edge extends Node { volatile AtomicReferenceArray<Node> array; //This is needed to ensure array elements are volatile public Edge(int size) { array = new AtomicReferenceArray<Node>(8); } @Override public Node getLink(String key, int hash, int level){ int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Node returnVal = array.get(index); for(;;) { if(returnVal == null) { return null; } else if((returnVal instanceof Vertex)) { Vertex node = (Vertex) returnVal; for(;node != null; node = node.next) { if(node.key.equals(key)) { return node; } } return null; } else { //instanceof Edge level = level + 1; index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Edge e = (Edge) returnVal; returnVal = e.array.get(index); } } } @Override public Node createLink(int hash, int level, String key, String val) { //Remove size for(;;) { //Repeat the work on the current node, since some other thread modified this node int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Node nodeAtIndex = array.get(index); if ( nodeAtIndex == null) { Vertex newV = new Vertex(key, val); boolean result = array.compareAndSet(index, null, newV); if(result == Boolean.TRUE) { return newV; } //continue; since new node is inserted by other thread, hence repeat it. } else if(nodeAtIndex instanceof Vertex) { Vertex vrtexAtIndex = (Vertex) nodeAtIndex; int newIndex = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, vrtexAtIndex.hashCode(), level+1); int newIndex1 = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level+1); Edge edge = new Edge(Base10ToBaseX.Base.BASE8.getLevelZeroMask()+1); if(newIndex != newIndex1) { Vertex newV = new Vertex(key, val); edge.array.set(newIndex, vrtexAtIndex); edge.array.set(newIndex1, newV); boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge if(result == Boolean.TRUE) { return newV; } //continue; since vrtexAtIndex may be removed or changed to Edge already. } else if(vrtexAtIndex.key.hashCode() == hash) {//vrtex.hash == hash) { HERE newIndex == newIndex1 synchronized (vrtexAtIndex) { boolean result = array.compareAndSet(index, vrtexAtIndex, vrtexAtIndex); //Double check this vertex is not removed. if(result == Boolean.TRUE) { Vertex prevV = vrtexAtIndex; for(;vrtexAtIndex != null; vrtexAtIndex = vrtexAtIndex.next) { prevV = vrtexAtIndex; // prevV is used to handle when vrtexAtIndex reached NULL if(vrtexAtIndex.key.equals(key)){ vrtexAtIndex.val = val; return vrtexAtIndex; } } Vertex newV = new Vertex(key, val); prevV.next = newV; // Within SYNCHRONIZATION since prevV.next may be added with some other. return newV; } //Continue; vrtexAtIndex got changed } } else { //HERE newIndex == newIndex1 BUT vrtex.hash != hash edge.array.set(newIndex, vrtexAtIndex); boolean result = array.compareAndSet(index, vrtexAtIndex, edge); //REPLACE vertex to edge if(result == Boolean.TRUE) { return edge.createLink(hash, (level + 1), key, val); } } } else { //instanceof Edge return nodeAtIndex.createLink(hash, (level + 1), key, val); } } } @Override public Node removeLink(String key, int hash, int level){ for(;;) { int index = Base10ToBaseX.getBaseXValueOnAtLevel(Base10ToBaseX.Base.BASE8, hash, level); Node returnVal = array.get(index); if(returnVal == null) { return null; } else if((returnVal instanceof Vertex)) { synchronized (returnVal) { Vertex node = (Vertex) returnVal; if(node.next == null) { if(node.key.equals(key)) { boolean result = array.compareAndSet(index, node, null); if(result == Boolean.TRUE) { return node; } continue; //Vertex may be changed to Edge } return null; //Nothing found; This is not the same vertex we are looking for. Here hashcode is same but key is different. } else { if(node.key.equals(key)) { //Removing the first node in the link boolean result = array.compareAndSet(index, node, node.next); if(result == Boolean.TRUE) { return node; } continue; //Vertex(node) may be changed to Edge, so try again. } Vertex prevV = node; // prevV is used to handle when vrtexAtIndex is found and to be removed from its previous node = node.next; for(;node != null; prevV = node, node = node.next) { if(node.key.equals(key)) { prevV.next = node.next; //Removing other than first node in the link return node; } } return null; //Nothing found in the linked list. } } } else { //instanceof Edge return returnVal.removeLink(key, hash, (level + 1)); } } } } class Base10ToBaseX { public static enum Base { /** * Integer is represented in 32 bit in 32 bit machine. * There we can split this integer no of bits into multiples of 1,2,4,8,16 bits */ BASE2(1,1,32), BASE4(3,2,16), BASE8(7,3,11)/* OCTAL*/, /*BASE10(3,2),*/ BASE16(15, 4, 8){ public String getFormattedValue(int val){ switch(val) { case 10: return "A"; case 11: return "B"; case 12: return "C"; case 13: return "D"; case 14: return "E"; case 15: return "F"; default: return "" + val; } } }, /*BASE32(31,5,1),*/ BASE256(255, 8, 4), /*BASE512(511,9),*/ Base65536(65535, 16, 2); private int LEVEL_0_MASK; private int LEVEL_1_ROTATION; private int MAX_ROTATION; Base(int levelZeroMask, int levelOneRotation, int maxPossibleRotation) { this.LEVEL_0_MASK = levelZeroMask; this.LEVEL_1_ROTATION = levelOneRotation; this.MAX_ROTATION = maxPossibleRotation; } int getLevelZeroMask(){ return LEVEL_0_MASK; } int getLevelOneRotation(){ return LEVEL_1_ROTATION; } int getMaxRotation(){ return MAX_ROTATION; } String getFormattedValue(int val){ return "" + val; } } public static int getBaseXValueOnAtLevel(Base base, int on, int level) { if(level > base.getMaxRotation() || level < 1) { return 0; //INVALID Input } int rotation = base.getLevelOneRotation(); int mask = base.getLevelZeroMask(); if(level > 1) { rotation = (level-1) * rotation; mask = mask << rotation; } else { rotation = 0; } return (on & mask) >>> rotation; } }
您可以尝试完全 Java库,它具有PatriciaTrie实现。 这个API很小,很容易上手,并且可以在Maven中央仓库中find 。