Tries
A trie is a variant of an n-ary-tree, in which characters are stored at each node. Each path down the tree may represent a word.
Tries are a relatively advance topic, make sure you come to this section once you feel a bit comfortable with Trees, Graphs and Recursion. Just be aware is quite important to be aware of this, as they come in interviews relatively often.
The * nodes (also known as "null nodes") are used to indicate completed words, just bear in mind these nodes are not always used. Sometimes people use nul pointers or boolean flags. In the example we have the completed words: many, my, lie, a.
A node in the trie dictionary could range from 1 to the alphabet size + 1 children (or 0 if boolean flags are being used instead of a * node).
Tries are commonly used to store an entire language for quick prefix lookups. While a Hashmap can tell us quickly if a string is a valid word, it can't tell us if a string is a valid prefix of a word.
π Problem: Autocompletion
Given a list of words:
['dog', 'dark', 'cat', 'door', 'dodge']
And the input:
input = 'do'
Print or return back a list with all the words that start of the input string.
output = ['dog', 'door', 'dodge']
π Bad Solution
public ArrayList badSolution(ArrayList<String> list, String inputWord) {
ArrayList<String> result = new ArrayList();
for (String s : list) {
if (s.contains(inputWord)) result.add(s);
}
return result;
}
Majority of people come up with a solution like this, which runs in O(N * K), where N is the size of the input list and K is the size of the string. This is not very efficient at all. Let's have a look at how we could optimise it using Tries.
π Solution
In order to solve this we will first need to assemble a Trie
Here is the implementation of the TrieNode and Trie classes:
Node
public class TrieNode {
private char character;
private HashMap<Character, TrieNode> children = new HashMap<>();
private boolean isWord;
TrieNode() {}
TrieNode(char c) {
this.character = c;
}
public char getCharacter() {
return character;
}
HashMap<Character, TrieNode> getChildren() {
return children;
}
boolean isWord() {
return isWord;
}
void setWord(boolean word) {
isWord = word;
}
}
Don't let this scare you, its mainly boilerplate code - with getters and setters.
As you can see, the node has a HashMap, the key is each Character to which this node points to.
Trie
Every time we input a word into the trie, we perform a check in order to see if the char already exist, if it does, we check with the next character until there is a new one, then we create a new node. Once we know the word has no more chars left, we toggle the flat of the last node to true - indicating the word has ended.
public class Trie {
TrieNode root;
public Trie() {
root = new TrieNode();
}
public void insert(String word) {
TrieNode current = root;
for(int i=0; i<word.length(); i++){
char c = word.charAt(i);
TrieNode t;
if (current.getChildren().containsKey(c)){
t = current.getChildren().get(c);
} else {
t = new TrieNode(c);
current.getChildren().put(c, t);
}
current = t;
if(i==word.length()-1)
t.setWord(true);
}
}
}
Solving the problem
In order to solve this problem, we need to first navigate to the node in which the prefix ends. From there we will be able to a variation of a Depth First Search, in which we will keep adding to the prefix the current character and traversing to each node, until we know the word ends, then we print the full word.
public void printWordsWithPrefix(String prefix) {
TrieNode t = traverseToNode(prefix);
printWordsFromNode(t, prefix);
}
private TrieNode traverseToNode(String inputWord) {
TrieNode current = root;
TrieNode t = null;
// traverse to the last node of the word
for (int i = 0; i < inputWord.length(); i++) {
char c = inputWord.charAt(i);
if (current.getChildren().containsKey(c)) {
t = current.getChildren().get(c);
current = t;
} else {
return null;
}
}
return t;
}
private void printWordsFromNode(TrieNode t, String prefix) {
if (t.isWord()) {
System.out.println(prefix);
}
// Depth First Search until we reach the last node of each word, then we can print it
for (int i = 0; i < t.getChildren().keySet().size(); i++) {
char c = (char) t.getChildren().keySet().toArray()[i];
TrieNode children = t.getChildren().get(c);
printWordsFromNode(children, prefix + c);
}
}