Greedy Algorithms

What are Greedy Algorithms?

These refer to a class of algorithms that solve a given problem with a “heuristic” – your best guess ! If we think of it as a multi step solution, greedy algorithms essentially try to make optimal choice locally at each step, in an attempt to get a globally optimal way of solving a given problem.

Also, as they are heuristics – there is no guarantee they always result in the most optimal way to solve a given problem. There may exist some subset corner cases or test data sets on which a given greedy algorithm may not work.

My familiarity with this class of algorithms is mainly because of my work as a graduate student where I tried to implement various routing algorithms. I worked on prototyping a novel hop by hop routing protocol called GSTAR for Future Internet Architecture project. We also tried to extend the unicast protocol to do efficient multicast – using greedy approach!

So, without further ado, let’s start decoding the basics of this algorithmic paradigm.

Components of Greedy Approach

There are two parts to greedy algorithm :

  1. Optimal substructure : This essentially means that solution to a problem of size ‘n’ depends on the optimal solutions for sub-problems of size k, where k < n.
  2. Greedy choice property : A problem is said to satisfy Greedy choice property if a global optimal can be achieved by choosing local optimum.

Note – When a problem has overlapping optimal substructure, we will utilize Dynamic programming strategy (shall be covered in another blog).

Fig 1. Components of a Greedy algorithm

Using Greedy

In general if you have an optimization problem at hand, that is a problem involving finding minimum, maximum, shortest etc, greedy strategy could be used.

Internet Routing !

I wanted to add here that this is also the most popular algorithm paradigm in designing routing protocols. Routing often requires us to find optimal paths to route a packet in a network [4](a connected graph with routers as the nodes) and each edge assigned a weight that might represent delay or number of hops between the router nodes. Be it Prim’s / Krushkal’s algorithm [4] to construct a minimum spanning tree or the Dijkstra’s algorithm to find the shortest path, they all depend on a greedy approach of finding a minimum cost edge at each step and adding nodes to spanning tree or the shortest path set in an attempt to find a globally optimal solution.

Next, we will look at a few such greedy problems here on the blog and understand greedy strategies used for solving them. We can understand better when we solve problems. Let’s start with one of the popular and well known problem :

1. Coin change problem

Problem :

Input : Coin denomination set D with denominations d1>d2>d3 ..> dm and a sum “S”.

Output : Minimum number of coins required to form the sum “S” (Consider you have as many as would like of each coin denomination) , output num[k] denotes the number of coins of denomination dk.

Algorithm :

  • Repeat steps below, until the sum remaining, S = 0 :
  1. Find the largest coin denomination (say dk), which is ≤ S.
  2. Use ⌊Sk⌋ coins of denomination dk, update num[k]
  3. Update S = S mod dk


for k = 1 to m
do {
    num[k] = ⌊S/d[k]⌋ // count as many as you would need of each denomination
    S = S mod d[k]   // next sub problem to be solved

Greedy choice :

  • At each step we make a locally optimal choice of choosing largest denomination dk from set D, such that dk <= S and adding to solution set (without bothering about global optimum).
  • Sum S is then reduced by dk and next locally optimal choice is made for Sum (S – dk) and so on, at each step, until Sum remaining is zero.

Optimal Substructure :

  • Let “P” denote the original problem of making sum S, using minimum number of coins from denomination set d1, d2 … dm.
  • Suppose the Greedy solution starts with denomination dk, dk <S. Then, we have the sub-problem P of making change for (S – dk).
  • Let “O” be an optimal solution to P and let O be an optimal solution to P’. Then, clearly cost(O) = cost(O’) + 1 , (cost = min number coins) indicating that globally optimal solution depends on optimal solution to sub-problems.
  • Thus, the optimal substructure property clearly follows.

Example : Consider U.S coin denominations : 1 cents, 5 cent, 10 cent, 25 cents, 50 cents. Find the minimum number of coins required to make a sum of 30.

Answer : Minimum number of coins needed to make 30 = 2 coins 
(i.e., 25 cents and 5 cents)

But, this heuristic fails …

Now, let’s consider a case when this fails. If the coin denominations supported are only {20,15,5} and we have to find minimum coins required to make a sum of 30

Greedy algorithm described here returns 20,5,5 : 3 coins , however the actual minimum coins needed are 15,15 : 2 coins. So, yes it does not give you the most optimal solution for all cases, but works for most cases , for example – this algorithm outlined works for the United States coin denominations.

We will see how we can handle the failure cases with another way of solving this problem, a little later on this website. So, do check back.

Up next on the blog

In the next week’s blog, we will look at two more popular problems solved using Greedy approach :

  1. Fractional Knapsack Problem
  2. Activity Selection Problem

Until then, keep decoding ! Come back next week.

Caveat :

Although induction [3] helps prove a greedy strategy and this class of algorithms seem to be intuitive, simple to code; they may not be as straight forward to a beginner.

So, don’t beat yourself if you are unable to devise a solution to a popular Greedy problem in say 30 minutes or even a day. It takes time. Keep learning and applying – eventually we will get there. 🙂

Further Reading

Here are some useful links for further reading :

  1. Algorithmic Toolbox Course (Coursera) 
  2. Internet Routing / here (Stanford)
  3. Greedy proof by induction(Stanford)
  4. Greedy algorithms for graphs(MIT)
  5. Solve some greedy problems

As always, do you like what you read ? Please do write to me at, OR –


Did you know?

More coffee you donate, higher is your coder karma! Oh! there shall also be more blogs published here. Thank you for being awesome! 😀


Implementing BFS and DFS


Hola ! Did the post on graph search bring your here ? Or you are here to check what we will be decoding today. Well, as the title goes : this post is about how do we implement BFS and DFS algorithms and the data structures we need to support this.

Before we begin : just a heads-up there ain’t no code here ! This blog will be about understanding how we could implement the algorithms. Once the idea is clear I am sure you will write the code in a jiffy in the programming language of your choice! Say what ? 😉

The Data Structures

In the previous post on graph search , we looked at what these algorithms are. Here we will learn about how we could implement them. This would require understanding of two simple data structures – queues and stacks. Queues and Stacks can be visualized as containers that provide storage and retrieval of objects in particular order. This order will decide the usage scenario. The process of adding and retrieving are called “queue” and “dequeue” respectively for a queue. They are referred to as “push” and “pop” for a stack.

Have you been in a queue at D.M.V or at the grocery billing counter. You must have noticed the first one in the queue is the first one to get serviced and move out of queue. Essentially queuing allows you to insert and retrieve in FIFO [first in first out] manner. With “stacks” this would be LIFO [last in first out]. Imagine a stack of plates at a salad counter.  You pick the top most plate (which was placed last into stack) and start at your turn. [Obvio ! :P]. So now let’s move on to actual agenda here – implementing BFS/DFS. Shall we?

Queues and BFS

If we want to implement BFS, we will need to inspect all the directly connected neighbors and then second level neighbors and so on.

Algorithm for BFS implementation

BFS algorithm can be implemented using “Queue” data structure to hold the to-be-visited-nodes at each level as we explore nodes. There have been discussions about if you can implement without a queue – short answer -not really, this means you would need to implement a queue action using stack or some other data structure  (which means you are still using queue :P).

One thing to be noted is that the nodes are marked visited as they are explored, to avoid getting into infinite loop with cyclic graphs or also prevent visiting a given node multiple times when it is reachable through more than one path (which would mean – it’s a ‘tree’ – every node has a unique reachable path starting at some node in a connected graph).


Begin : Set all nodes to "not visited";
q = new Queue();
q.enqueue(starting node); [from where BFS begins]
while ( q ≠ empty ) do
    x = q.dequeue();
   if ( x has not been visited )
      visited[x] = true; // [ Mark node x as visited ]
      for ( every edge (x, y) /* we are using all edges ! */ ) 
      if ( y has not been visited ) 
      q.enqueue(y); // Use the edge (x,y) !!!

For more clarity (visit visualization aid here), the operations on queue while performing BFS are depicted in Fig[1] below with an example graph. Yellow arrow  at (2) indicates the direction of insertion into queue. Blue arrow  at (5) shows the direction of de-queuing. Marked in yellow boxes are nodes to be explored/operated on next.

Starting at node-E , C and D are first level neighbors , while A and B are second level neighbors.

Fig 1. BFS – Queue operations on a graph.

Stacks and DFS

If we want to implement DFS, we will need to choose one neighbor and it’s neighbor and so on till we have reached end of iterating through all the neighbor levels.Check this link out – for better visualization (dfs visual).

DFS can simply be implemented using a stack data structure to store the “to-be-visited-nodes” (instead of queue as in BFS).

DFS algorithm has an iterative as well as recursive approach available for it’s implementation – that are both actively used.

Pseudo code for iterative approach:

Begin : Set all nodes to "not visited";
s = new Stack();    //use stack structure for DFS
push(initial node);   // push() operations stores node to be visited 
   while ( s ≠ empty ) do
      x = s.pop();     // pop() removes value from STACK to explore
      if ( x has not been visited )
         visited[x] = true;        // same as BFS - mark the visited nodes
         for ( every edge (x, y)) //all edges need to be explored like BFS
            if ( y has not been visited )   
	       s.push(y);       //any new node needs to be pushed on STACK

Recursive Implementation

Oops, what is recursion? Go here .

For this implementation we can replace the use of  “stack” data structure with program stack, by use of recursive function call (call to self). The pseudo-code for it is as below. The step with  dfs(G,n) is the recursive call – you call it repeatedly until the last node is reached.

Pseudo code for recursive approach

dfs(Graph G, node x)
    Mark x "visited"
    for (every edge (x,y) in G): 
       if (y has not been visited):
           dfs(G,y) //recursive call   
End procedure

Stack data structure versus program stack

Now, coming to the meme in the blog- do you stick to recursion that uses stack memory or re-write an iterative DFS that will use stack data structure. I bring this up for discussion here, as I think this would help in understanding relation between memory and the program we write. (No clue about memory – Stack, Heap, Data segment , Text etc ? – Please go here right away ! )

With recursive function we will be utilizing the system stack  for it’s operations.

  1. It’s important to note that depending on the depth of the graph we can run out of stack space  – cough *stack overflow* cough (welcome to embedded systems)
  2. Additionally, we may also incur extra cost pushing and popping the program counter -system stack related overhead.
  3. There’s also notion of cache friendly coding – temporal and spatial locality (loops may be better idea).

But, recursive approach results in clean and simple code.

But, I think it really needs to be decided on a case by case basis – you need to judge the depth of recursion, is cache optimization the need of the hour – just have the answer for “what are the constraints you have to work with”. 

Essentially, Think before you ink implement !

*Note : The animation links in this blog post are not created by me. They are available online for reference.

Well, why don’t you try implementing it now ? k-bye.

Graph Search

I was revisiting the graph traversal and search algorithms. Specifically “breadth first search” and “depth first search”. So, the blog will be about these. I will try to decode what they refer to. So hop on.

Why graphs ?

I have been interested in wireless networking and routing through my recent graduate studies. Graphs are critical to understanding  and analyzing how one can efficiently route packets on the internet. Hence, my interest in this topic !

Further, graphs can be used to solve range of problems from finding shortest path between nodes in a network (internet routing), finding efficient flights for you between cities (yes ! Expedia et al) , finding your friends and their friends on social media with common liking (Facebook yo!) and so on…

[Note : Moment you hear connections and/ paths – “Graphs” should pop on your mind]

What is a graph / data structures used to represent it?

Read here


BFS and DFS provide a way to explore a graph starting at any given node. As the name indicates, one search or traversal goes breadth first and the other goes depth first. Okay, don’t break your head over what breadth and depth.Shown below is how we would start exploring the neighbor nodes starting at any node in an “acyclic graph“. [A] and [B] show acyclic graph being explored with DFS and BFS.

In the snapshot, starting node is marked “A” in orange and nodes being explored in first run are marked blue, second run are marked green and third with yellow.


Fig [1] . BFS and DFS on a connected acyclic graph (tree)

Yes, this can be extended to cyclic graphs as below. How we begin exploring the graph is indicated by the arrows, starting at point A.

BFS : A, B,D, G, C,E,F

DFS : A,B, C,D,E,F,G


Fig [2] . BFS and DFS on a connected cyclic undirected graph

Essentially in DFS, we pick a directly connected neighbor and explore its neighbor and so on till the end of the chain of unvisited nodes and then move on to next directly connected neighbor and so on. Think of it like you are exploring a path in a maze problem to it’s full length, to see if it can get you out.We can get the longest path/most expensive path using this technique. Also, used in topological sorting – sorting based on hierarchy (directories)

With BFS,  we explore directly connected neighbors first , then pick one neighbor at a time and explore its first neighbors (imagine a ripple in water). We continue exploring  how ripples go.  This technique helps us find “shortest path” between two nodes in a constant weight graph.

Data Structure used for DFS and BFS?

Read here.

Minimum Spanning Tree problem

Now, who has heard of a “Minimum spanning tree” ?

Spanning tree is a sub-graph that contains all vertices and is a “tree” (Yes !no cycles !).

Minimum spanning tree refers to a spanning tree  of connected , undirected graph that has least cost of all spanning trees. Are you wondering about directed graph MST ? It’s called “Arborescence problem”.  (let’s ignore it for now)

So, what do we mean by cost ? cost is the weight associated with each edge between two vertices. For example – cost can refer to distance in miles , when you are trying to find shortest path with multiple stop points on a map or it can be price of planes tickets between nodes (cities) or also the time delay incurred between nodes(routers) (internet routing) and so on.

So, if the weights of all edges is a constant or equal (ie., the cost associated with adding an edge to spanning tree is same  for all edges), any spanning tree is a minimum spanning tree.

Now, to find a minimum spanning tree in undirected  equal weighted graph, we just need to traverse all nodes of the graph once. Hence, in this case the graph traversal algorithms discussed above – BFS and DFS essentially would take same time (linear time – O(|V|+|E|) ,  where |V| is the  number of vertices and |E| is the number of edges in the graph) in finding a minimum spanning tree. Essentially, they achieve the same thing- exploring all nodes in a graph.

However, which one is to be used where would depend on “how” you go about exploring the nodes of the graph and “what” is it that you are looking to solve. BFS and DFS algorithms differ in the way they explore nodes and thus how they are implemented. More info available here

Make sure to read the blogs linked in here for a better understanding.

*** That’s all folks ***