In this blog we will try to touch upon the basics of graph that one needs to know. Starting from what are edges or vertices, types of graphs there are – to – what are the data structures available to represent graphs in computer memory. So, let’s get started !

## So, what is a graph, it’s nodes and edges ?

Graph is a data structure that allows representation of data as relations between nodes/vertices using edges. A,B,C,D,E refer to nodes ; AB, AC, AD,DE,EC are the edges. Of course, understanding why would one use a graph is useful. So, some examples what nodes and edges can refer to pertaining to different use-cases are listed below –

a. Nodes can be routers and edges be connections between them with weight representing the delay incurred for sending a packet over that edge. This representation can then be used to find best routing path for a packet – one that takes shortest time

b. Nodes can be people on social network and edges can represent connection between people with weight say representing the number of common interests- this can be used to check match between profiles on social networks , suggesting friends and so on.

c. Nodes can be destinations points (places) and edges can be roads connecting them – used for finding navigation path and so on.

Various types of graphs are also shown in Fig [1] below.

[B] shows cyclic and acyclic graphs. Cycle refers to loop in the graph – if you can start and end at the same node (any node)- graph is “cyclic” else “acyclic”.

[C] shows undirected and directed graphs. Edges are bi-directional in undirected (Facebook) but uni-directional in directed graphs (web crawling!), the direction indicated by the arrow tip.

[D] shows disconnected and connected graphs. If there is route or set of edges to connect every node to every other node- then the graph is connected else it is disconnected.

[E] is a weighted graph – each of the edges have a “weight” or “cost” associated with them as represented by numbers on the edges.

We also have another way of classifying graphs –

### Sparse and Dense graphs

If the number of edges in the graph E is of the order of the number of vertices (v) E=~v, then graph is said to be sparse. If E=~(v^2) , then the graph is said to be dense – ie., a highly connected graph. (Keep this in mind) – We need this further down in this blog when we want to pick one data structure over the other to represent graph.

*Yes! have not included the multi-graphs , a graph that can have multiple edges /relations connecting two nodes.(Let’s keep it at basics ! :)).

Okay, now that we have seen various types of graphs, let’s check how we could represent them in computer memory.

## Data structures

We will be looking at the two most popular representation schemes- the adjacency matrix and adjacency lists. These are the data structures that help us work on a graph in our programs.

### [1] Adjacency Matrix

Now, who is familiar with matrix – put your hands up ! Easiest way to represent graphs is perhaps the adjacency matrix. So,Yes! all you need is a vertex array of length “v” and a 2-D matrix that is of size v x v , where v refers to the number of vertices in the graph. Shown below in Fig[2] is a graph [A] and its adjacency matrix representation [B]. Row i corresponding to node i (i derived from vertex array) has the information about presence of edges between this node and other nodes (say node j) (where 0<=i<=v-1 ; 0<=j<=v-1). Example : Row zero (i=0) and column j in Fig [2]shows the connections of node 0 (mapped to vertex A in this case) with other nodes (0<=j<=v-1) in the graph.

##### Fig [2]. (Undirected, unweighted)Graph and it’s adjacency matrix

Some observations to be noted are :

For **undirected graphs **(Fig [2]) , the adjacency matrix is symmetric. The value corresponding (row,column) = (i,j) is equal to value at (row,column)=(j,i) – which means that presence of edge **E(i,j)** indicates presence of edge **E(j,i)** (example – presence of AB indicated presence of BA). However, for **directed graphs**, this **may not be true**.

Marked in yellow blocks as “1”, indicates presence of an edge between the two nodes in **unweighted **graph (Fig [2]) and “0” indicates no edge is present. In **weighted graphs**, yellow blocks will contain the weight corresponding to the edge between the nodes instead of “1”.

### Good , Bad and the Worst

If you want to check if an edge is present between nodes i and j , we just look up **Matrix(i,j)**. If we store the vertices and their corresponding index number in a hash table/look-up table, then this operation takes : [O(1) to look up first vertex index i + O(1) to look up second vertex index j+ O(1) to look up Edge-Matrix(i,j) ] time** ~= O(1) (great!)** . However, if you need to check how many nodes a given node (say node i) is connected to, then we need to go to row (i) in the matrix and traverse all “v” entries – takes O(v) time (not so good for large graphs).

The memory taken is of the order O(v^2) (sucks for larger graphs!) . Also, for sparse (low connected graphs), a lot of vertices wont be connected and will still need an entry in the matrix (look at the entries marked “zero” in Fig[2]). (sucks again!).

Let’s now see if we can do something about this space and wastage issue. *Tada* !! Time to explore adjacency lists.

### [2] Adjacency Lists

As the name indicates we will be using **lists**** **(~~Matrix~~!), to hold graph connectivity information. All we need to store is a linked list of directly connected neighbors pertaining to each node/vertex (Linked lists are programming 101 – please check here). So, we will have up to (max of v-1) list elements corresponding to each node. Refer Fig [3] below.

##### Fig [3]. (Undirected, unweighted)Graph and it’s adjacency list

As seen, since we are storing information related to** connected neighbors** , we are eliminating the need to store anything for unconnected neighbors (**score-1** over Adjacency matrix : think of a sparsely connected graph with large number of nodes to understand this). The number of elements in the list “L” associated with a node can be at most O|v-1|. To check how many nodes a given node is directly connected to will only take O(L), where L is the number of directly connected neighbors (0<=L<=|v|) – not ~~O(v)~~ as in case of matrix! However, to check if an edge exists between two nodes, would take worst case **O(L)** – (Oops ! that was O(1) in case of adjacency matrix).

Also, comparing Fig [2] and Fig [3], we can identify the memory savings. These savings are comparable when we are working with sparsely connected large graphs. However, for densely connected graphs, lists would get bigger (~O(v) length per node) which implies space requirements will be similar to adjacency matrix. In that case, we would choose adjacency matrix- just because of O(1) edge look up! Come on- O(1) beats everything !

Before we wrap up- here’s some food for thought !

If there are millions of users on Facebook – are we gonna matrix index it – how do we manage millions of connections?

Can we do something better? Read here

– Info on the meme : here

*** That’s all folks ***