Introduction to Tries

HOLIDAYCOLLAB2017

BLOG 2

Hi Guys! After weeks of being unwell, I am finally back to doing what I love. Blogging here, of course! But, this time again I am not the author. This is a continuation of my holiday collab 2017 blogs. In the last blog, Roksi introduced us to Binary trees.

Author of this blog is : Mayak Patel  (pic below)

(Find him on Instagram : @you._know.who)

mayank.jpg

Without further ado, let’s dive into Mayank’s introduction to Tries write-up!


Introduction

Hello code-heads!Welcome back to the blog. As the title goes, this blog is about a data structure called “Trie”. A Trie data structure turns out to be handy in many situations, some of which will be discussed as you read further. There are many Algorithms as well as data structures for searching particular words in a given text. There are some of these which are included in the standard libraries, but there are many which are NOT. Trie is a good example of one that is NOT included in standard libraries of any language.

OKAY, BUT WHY DO WE NEED TRIES ?

Let’s dive right in to the terminologies of a Trie data structure. Suppose you have a Word and and a Dictionary. A dictionary can be thought of a collection of many words. Now, if we want to know, if a single word is present in the dictionary or not, Tries are helpful. The problem we are addressing here can be solved in many ways, like using HashSets, Hash Tables etc. Now the question that comes up is,

Why do we need to learn about and use a completely new data structure (Trie), when there exist multiple solutions that we already know of?

Here’s why :

  1. The time taken to insert a new word to the dictionary in O(length), where length refers to the length of the new word. This time is a lot better than that of a Set, and a a little bit better than that of a hash table.
  2. Sets and the hash tables can only find in a dictionary words that match exactly with the single word that we are finding; but tries allow us to find words that have a single character different, a prefix in common, a character missing, etc. Interesting right ?

BUT, ARE THEY EVEN USED IN REALITY ?

The answer is YES ! Below are some of the applications of the Trie data structure in the Industry:

  1. A web browser : When you start typing a url, say google.com, the browser can auto complete your text, or can show possibilities of text that you could be writing. This can be done using a Trie very fast !
  2. Word search based on a prefix : Suppose, we want to query our dictionary for all the words starting with the prefix “Ma”, it can be very easily done with the help of tries.
  3. Do you know how an orthographic corrector can check that every word that you type is in a dictionary? Again a trie.
  4. Suggested corrections of the words that are present in the text but not in the dictionary.

What is a TRIE ?

So, what exactly is a Trie?

Now, that we have had a good introduction about tries answering how they are used, how are they better than sets and hash tables in some scenarios etc, maybe you would want to get into the details of  what tries are, starting with why tries have this name. The word ‘Trie’ is an infix of the word “retrieval”, because Trie can find a single word in a dictionary with only a prefix of the word. The main idea behind the Trie data structure is the following :

  1. Trie is a tree where each vertex represents a single word or a prefix.
  2. The root represents an empty string (“”), the vertices that are direct sons of the root represent prefixes of length 1, the vertices that are 2 edges of distance from the root represent prefixes of length 2, the vertices that are 3 edges of distance from the root represent prefixes of length 3 and so on. In other words, a vertex that are k edges of distance of the root have an associated prefix of length k.
  3. The next figure shows a trie with the words “tree”, “trie”, “algo”, “all”, and “also.”
trie-01.jpg
Fig 1. TRIE representation example

Representing a TRIE

The representation of a Trie is very similar to that of a Tree upto an extent. Just like trees, tries are also collections of nodes.

So, what sets Tries apart from Trees?

The answer is the Structure of the nodes!

In a tree the nodes generally contain data and the links to the children nodes. But, each Trie node contains a list of characters, each of which can have multiple children. For implementing a Trie, we need to code a structure with some fields that indicate the values stored at each vertex. As we want to know the number of words that match with a given string, every vertex should have a field to indicate that this vertex represents a complete word or only a prefix (for simplicity, a complete word is considered also a prefix) and how many words in the dictionary are represented by that prefix (there can be repeated words in the dictionary). This task can be done with only one integer field word.

Because we want to know the number of words that have prefix like a given string, we need another integer field prefixes that indicates how many words have the prefix of the vertex. Also, each vertex must have references to all his possible sons (26 references). Knowing all these details, our structure should have the following members:

structure Trie
   integer word;
   integer prefixes;
   reference edges[26];

Basic operations on a Trie

Now that we know learned about how a Trie structure can be implemented, let us have a look at the operations we perform on a trie.

There are 3 of such basic functions:

  1. Adding a new word to the dictionary: This function will accept a string argument(word) and add that word to the dictionary.

    trie02.png
    Fig 2. addWord operation
  2. Counting the number of words starting with a given prefix: This function will accept a String prefix and return an Integer denoting the number of words in the dictionary that start with the given prefix.
  3. Counting the number of words exactly matching with a given string: This function will accept a String word and return an Integer denoting the number of words in the dictionary that match exactly with the word.

    trie03.png
    Fig 3. countWords and countPrefixes operations

COMPLEXITY ANALYSIS

Trie is an efficient information retrieval data structure. The complexity for searching can be brought down to optimal limit using tries. Using Trie, we can search a key in O(M) time, where M is the maximum key length.

But this optimal search complexity comes with a penalty. The penalty is on Trie storage requirements. The storage requirements are high because every node in a Trie consists of multiple branches (26), and so on.


Conclusion

Hope this blog has given you an idea about what a Trie is and why do we even need a Trie. By now, you must also have an idea about how a Trie is implemented and what are the various operations that can be performed. Listed next are the links to some of the practice problems that might help you master the application of tries and hence enable you to use them in solving real-world problems. Have fun! That’s all for now.


FURTHER REFERENCES

  1. Interactive application to learn about Trie
  2. Top coder notes on Trie
  3. Solve problems on Trie
  4. one more Trie problem

Let us know what you think. Was this helpful?

Don’t forget to comment below and encourage us, if it was helpful to you. 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s