HOLIDAYCOLLAB2017
BLOG 1
This is the first blog in my #holidaycollab series.
Author of this blog is Roksi Freeman (pic below).
(Find her on Instagram: @byproxyroksi)
It’s the season of trees! What better time to discuss the tree data structures than now, right? ;-). So, here’s Roksi bringing to us the geeky joy of this festive season with her blog write-up below on Binary Search Trees! Hope you guys enjoy reading as well as learning!
“Not only is that terrible in general, but you KNOW Billy’s going to open the root present first, and then everyone is going to have to wait while the heap is rebuilt.” credit: xkcd.com
Hello and happy holidays! What a great time to talk about the most festive data structure there is, trees! Today the festive trees of choice are binary search trees (BST), also called sorted binary search trees. Binary search trees are a data structure representing a boolean function, which is useful in computer science, or just for fun.
Background
Data trees originally came from the basic idea of Shannon expansion, in which a function is split into two sub-functions by assigning a variable. Binary search trees showed up in 1960, invented by computer scientists Windley, Booth, Colin, Douglass & Hibbard. At that time they were invented, binary search trees were used for sorting, searching and file maintenance. The invention of binary search trees was a collaborative effort over the course of around three years between 1959 and 1962, and now they’re an important data structure in computer science useful for search, insertion, and deletion in enormous sets of data. Other than the published papers, there isn’t that much out there about how binary search trees were invented, so I’ve cited some of the early work at the end of this article for those who want to get into the weeds with BSTs.
Binary Search Tree
A binary search tree is a rooted, sorted binary tree. General binary trees consist of a root node at the top, followed by “children” nodes in the following layers. Terminal nodes at the bottom layer are called leaves. With a binary search tree, the values of the right subtree of each node must be greater than the root, the left must be lesser than the root. While binary search trees can have at most two children, a node with no children can still be called a binary tree. The layers of a tree are zero indexed, starting with layer 0, followed by layer 1 and 2 and so on, making the maximum nodes you can have at the level i, 2^i. A binary search tree has multiple states; proper, balanced, complete and perfect. A totally useless binary search tree can have only one root node and be improper. A binary tree can also only have one child and be improper, but it must be left-leaning.
Proper | Balanced | Complete | Perfect |
Every node has 0 or 2 children | Sub-trees of any node has a height difference no greater than 1 | Every layer full, and as left as possible | Every layer is full, including the leaves |
Usage
Binary trees are useful if you want to store a modifiable (and potentially tremendous) collection of data in a computer memory, and be able to search it quickly as well as efficiently insert and remove elements. For comparison, consider arrays and linked lists. You may have been coding and noticed that in a for loop if you want to add data to an array, you either have to copy an array and rebuild it to add on, or you can create an empty array with x spaces to insert elements, instead of building one cell at a time. Efficiency-wise, it’s better to create a large empty array than to copy and rebuild the same one over and over again. The reason to use a binary search tree is efficiency, just like most searches are faster with a sorted array than a default unsorted array. The next logical step is to explain a binary search tree as a linked list, which is also a BST, but not a perfect one or even a really good one.
Analysis of various operations
Comparision with other data structures such as arrays and linked lists.
The table below lists the various complexities for different operations of data structures:
Array | Linked list | Sorted array | BST | |
Searching | O(n) | O(n) | O(log n) | O(log n) |
Insertion | O(1) | O(1) | O(n) | O(log n) |
Deletion | O(n) | O(n) | O(n) | O(log n) |
While a linked list is technically a binary tree, it’s not efficient to search. To minimize the time to search a tree it should be dense, each node having two children, so you want it as close to perfect as possible because a perfect binary tree stores the nodes at a minimum height, which allows for minimized traversal time. Height is key here. Binary search trees are measured by height and density, so the above list has a height of 7. With a linked list it takes up to O(n) time to search, in this case 7, whereas if you had a denser tree, the height could be as low as 3 with for a tree with 7 nodes.
The dense tree has a search time from O(log n) (Fig 2.), since like a binary search each traversal discards half the search space, to at worst O(n)(Fig 1.), which is much preferable if you don’t want a search to take ages. A binary search tree should be balanced to limit the amount you have to traverse because you have to traverse the height of the tree. In intro to computer science when the professor talks about binary searches, what makes a binary search tree more efficient is that the left child is always lesser than the root and the right child is always greater, so when searching you know that you don’t have to go down the right subtree if your value is smaller than the root. So in a binary search tree, whatever selection you make at the time is guaranteed to be the efficient path to your search. Unlike a greedy algorithm that always chooses the best option at the time, but may not be the most efficient choice long term.
A binary search tree may not be able to clean your room, but when it comes to search algorithms they are pretty spiffy. Donald Knuth, a seminal voice in computer science and a wiz on binary trees, says in his 17th Annual Christmas Tree Lecture, that trees are the most interesting thing to come out of computer science in the last 25 years.
Binary trees of all varieties are also useful for doodling in math class as Vi Hart could tell you. Or, in the spirit of the holidays, say Santa wants to look up child #345,575,423 to pull their naughty/nice records. An array containing hundreds of millions of records might take hundreds of seconds to search, so instead Santa could implement a data structure like a binary search tree and he wouldn’t have to waste so much time going over his list (twice)
Caveats
Binary search trees are awesome and infinitely useful, but not flawless. It’s less efficient if you don’t have a perfect and balanced tree. The shape of the tree also depends on the order of insertion and can be degenerated.
In a binary search tree you can search quite easily, and although you can make deletions easily you have to rebuild the tree at the point after the deletion, so, in the xkcd comic above, opening the root present first is a deletion, so the whole tree has to be rebuilt. This wouldn’t be seen as a problem for many younger siblings the world over, but thankfully most kid siblings don’t know about data structures and binary search trees. There are self-balancing trees which help avoid this issue, but that’s more a question of implementation.
In Conclusion
Hopefully, the awesome people reading about data structures over the holidays have enjoyed making the festivities a little bit more technical. I for one, have had a major breakthrough with the comparison of data trees and holiday decorations, and I’m sure my relatives will also be rolling their eyes at me with whatever math inspired holiday themes to come up next year. Cheers!
Further References
Here are some references if you want to get deeper into the weeds with data structures and binary search trees:
[1] MIT OCW’s materials for Introduction to Algorithms
[2] Nick Parlante on Binary Search Trees
[3] Hibbard, Thomas N. “Some combinatorial properties of certain trees with applications to searching and sorting.” Journal of the ACM (JACM) 9.1 (1962): 13-28.
[4] Douglas, Alexander S. “Techniques for the recording of, and reference to data in a computer.” The Computer Journal 2.1 (1959): 1-9.
[5] Windley, Peter F. “Trees, forests and rearranging.” The Computer Journal 3.2 (1960): 84-88.
What did you think of this blog? Let us know at decodergirlblog@gmail.com
Happy Holidays from both of us!
Thank you Roksi! You are awesome! 🙂
what did you think? Was this useful? Let us know. Happy holidays! 🙂
LikeLike