Linked Lists

Data structures Series  – Blog 02


Linked Lists

Welcome back! Let’s continue with the data structures theme. Here’s the blog on the next data structure – that would solve the problem we had with arrays. Essentially, these two problems :

a. Arrays had fixed size , that needed to be specified when we define them. Resizing is an issue. So, developer might allocate a worst case array size, that may not be efficient if the number of elements to be held varies.

b. When we want to rearrange data in an array or insert data in between or front of an array, it  gets expensive.

Linked lists are the data structures which will help us in addressing these issues. Arrays and linked lists both essentially provide a way to store data for a client. Although linked lists to be discussed today, are good for cases arrays may not be great, they do come with their own limitations. We will cover it all in this blog, keep on reading.


Basic Stuff

If you have ever wondered why people often quiz you about linked lists? Wonder no more. It’s  because these are the most basic data structures that form the foundation for understanding more advanced data structures ( such as: binary trees, queues) etc.

Let’s suppose you need to write some basic C program to read all words (strings) contained in an input file. The first question that we would have is – How would you store the words read by the program? If we used a string array here, what would be the size to be allocated for the array? The number of words in the file is not known.

In this case, if we used a linked list, we can read words without wasting memory or be running out of memory (as would be the case, if we used arrays). This is because, as the name implies, linked lists are nothing but a list of elements linked together using a pointer parameter, in addition to the data that needs to be held – words in this case. So, the items to be inserted are just linked to the list, dynamically as and when required.

Now that we have a use case in mind, let’s learn about the internals of linked lists, operations supported and some overheads associated with their usage.


Internals

Linked list data structure are similar to arrays, but use an extra parameter – a pointer to link the items together. With this additional overhead of pointer per entry of data to be held, we can overcome the disadvantages of arrays and use memory efficiently. Linked lists use a different strategy for allocating memory for elements as compared to arrays. They allocate memory per element, when a new element needs to be inserted, not as a block as in case of arrays.

As shown below, a set of integer elements are organized in a linked list as shown on the right. We can note that there is an extra address field, that links the items together in the list. Values {2,3,5,7,8} are the data elements. We can clear;y see there is an overhead per element, to hold the pointer to next data element, in case of linked lists. But, even with this, we may be more memory efficient than arrays, when we perform operations such as always insert to the top of the list or storing elements in memory when the number of elements is not known apriori.

arr_ll
Fig 1. Arrays and linked lists elements in memory

We define a linked list element/node and a create a linked list, as below :

/* define node element for a Singly linked list*/
typedef struct Node{
int data;
Node* next;
}Node;

/* function to create and return linked list node */
Node* create_node(int data){
Node* n = (Node*) malloc(sizeof(Node));
n->data = data; // store the data
n->next = NULL; // initialize next address to NULL
return n;
}

/* Example code to create a linked list of nodes 
*  Creates a linked list with three data elements/nodes 
*  Linked list : n1->n2->n3->NULL */ 

Node* create_test_list(){

Node* n1 = create_node(10);
Node* n2 = create_node(20);
Node* n3 = create_node(30);

// update pointers
n1->next = n2; // n1 holds pointer to n2
n2->next = n3; //n2  holds pointer to n3

// n3 is the last node of this list and 
// next shall be stored as NULL to imply this.

}

Operations supported

We can implement various operations on linked lists such as insert(), delete(), lookup(), hasloop(), removeloop()  and so on. It would be a good idea to write code for these operations, to understand linked lists better. Check [1] and [2] under references below to get started.


Doubly Linked List

These are similar to singly linked lists discussed above. But, instead of holding pointer only to the next element, there is an additional pointer per node to point to the previous element as well. Why do we need them? Wait for next week’s blog 🙂 ll1.png

/* Singly Linked list node */
typedef struct Node{
int data;
Node* next;
} Node;

/* Doubly Linked list node */
typedef struct d_Node{
int data;
Node* next ; 
Node* previous; // extra pointer in doubly linked list
} d_Node;


Drawbacks

What do you think would be the drawback of a linked list, other than the storage of extra parameter – i.e., a pointer per element. Yes! It is the insert at the nth location and also lookup nth element operations. In arrays, we had an O(1) look up and also O(1) updates.

But, with linked lists, we have to iterate all the n-1 nodes, to find the place to insert nth node to the list. Also, if we want to look-up nth node, we need to traverse n-1 nodes, before we get the value of nth node- resulting in O(n) cost.

Visual is always better to understand. Please check the picture below.

ll1
Fig 2. Insert at nth position, operation on arrays and linked list

 


Further Reading

  1. Linked list problems (Stanford)
  2. Solve some problems related to linked lists

Next, we shall explore the Binary Search Tree data structure which is realized using a doubly linked list discussed here. As always, write to me at decodergirlblog@gmail.com if you have any queries or suggestions. I would love to hear from you. Until then, keep decoding and enjoying your coffee ! 😉

 

Did you know?

More coffee you donate, higher is your coder karma! Oh! there shall also be more blogs published here. Thank you for being awesome! 😀

$1.00

Arrays

Data structures Series  – Blog 01

Hi Guys! It’s September and as promised in my last post, a data structure blog is ready and waiting to be read.*** Drum-rolls please ***  Here’s my first blog in the Data Structure series. 😉


Data Structures

Data structures refer to data representation or data organization for holding program data in the computer memory aiding with the processing or operations to be performed on the data.


Arrays

Arrays are basic data structures in which :

a. All values held by a given array are of the same type – homogeneous

b. Values are allocated contiguous locations in computer memory


Arrays in Memory

I believe it may be a good idea for a programmer to know about behind the scenes of the code written, for efficient programming – that is how code interacts with Memory.  This block is mainly intended at visualizing how arrays are stored in computer memory.

Declaration and usage of arrays need us to know the element_type , name_for_array and array_size :

element_type indicates the data_type (such as int , char, float etc) of the element_value that shall be stored in a given array.

name_of_array – serves as constant pointer to first element of the array when we need to look up or change element_values in the array.

array_size shall indicate the number of elements held by the given array.

Fig 1 shows the side by side view of memory representation of two arrays A and T representing a 1-D and a multidimensional array respectively.

Real world data representation shows how we visualize the data that is to be stored. Declaration shows the code used to declare an array to hold this data and “corresponding memory layout” shows how it is actually stored in computer memory.  In case of 1-D arrays , this is pretty straight forward, how you see is also how  the computer represents data. However, in case of multi-dimensional arrays on the right side , note that how we see data and how it is stored varies. Data is stored row wise , each row  translated to a 1-D representation starting with 1-D array for elements of row 0 {10,20,30}, followed by row 1 {40,50,60} and so on.

For multi-dimensional arrays can be stored as row-major (as depicted) or column major. In case of column major, each column shall be represented like a 1-D array at a time, instead of row.

arr_addressing.png
Fig 1. Memory Layout of arrays

Now that we have an idea of how arrays are represented in the memory, we can take a look at operations on arrays next.


Array operations

1-D Arrays

  • Various supported declarations and initialization of arrays in C/C++ is as shown below.
/* Global arrays that are initialized without size being specified */ 
int c[]={2,3,4};

/* Global arrays without explicit initialization are 
   set to zero by default */
int d[10];

//local declarations of arrays
int main(){
 /* Array size variables must be declared as constants */ 
    const int a_size = 5, b_size = 3, c_size= 3;

 /* array 'a' is initialized with element values */ 
    int a[a_size] = {3,4,5,1,6}; 

 /* array 'b' is initialized without any values using {}   
    all elements shall be by default initialized to zero*/ 
    int b[b_size]={};
}
  • Array elements are referenced by an index. The value of index ranges from 0 to n-1 for an array of size ‘n’. Highlighted in red shows how an integer array can be passed to a function either as an integer pointer or array with dimension or no dimension specified as shown below.
/* array as function param can be passed as a pointer */
void print_arr(int* arr, int n){

/* Given array size, we can look up array elements using index ,    
in this case i , that takes values from 0 to n-1*/ 
for (int i=0; i<n;i++){
 std::cout<<arr[i]<<std::endl; 
}
}

/* another way to pass array to functions 
   using dimension-less array representation */
void print_arr(int arr[], int n){
}

/* array can also be passed as a sized array */
void print_arr(int arr[5], int n){
// Note: in case you pass some other dimension array, 
   no bound check is performed. So, no error reported */
// check the section below under "Fact, Did you know" :)
}

Multi-dimensional Arrays

/* declaring multi-dimensional array of size 2x2 */
/* each row is like a 1-D array, here 2 rows are 
   declared one after the other*/
int arr[2][2]={{3,4}{5,6}};

Assigning values to various indices and look up of element value at a given index is O(1) operation.

However, if you need to rearrange or move around data, it can get expensive. For example, as shown in Fig 2, if we want to move the last element to start of the array of size ‘n’, we will need n-1 copy operations to move the elements downwards and then place the element to the beginning of the array. We can clearly see this can get expensive when the number of elements get higher and if this operation is to be performed frequently.

Also, in case of arrays, another problem is static allocation. If we do not have an idea of  how much space is to be allocated, we may end up allocating more than required and wasting space or less than required – in which case we would need reallocation.

There is another data structure that takes care of these problems – namely “linked lists” (coming up next week ! 🙂 )

arr_expensive.png
Fig 2. operations on arrays to rearrange data

Relation of arrays and pointers

If you are working on cpp or Java, worry not my friend ! 😉 it’s all taken care of, for you. While C programmers, take some time to wrap your head around this. It is simple once you understand ! 🙂

// quick pointer refresher 
For a given pointer : int* ptr ;
ptr is a pointer to an integer data

ptr holds the address 
*ptr gives the value at the address held by ptr // de-reference

// for pointer to pointer case 
int** p_to_p;

*p_to_p holds the address
*(*p_to_p) gives value at address // de-reference

Let’s consider a 2-D array :

int a[m][n];
//2D array is an array of 1-D arrays per row
// 1D array name is nothing but a pointer to an integer
//2D array name is hence a pointer to pointer to an integer
  • A[0] is address of row 0
  • Value of element in column 0 of row 0, A[0][0], can be accessed using *(A[0]+0) , ie., A[0]
  • element 1 of row 0, A[0][1] is accessed using *(A[0]+1)
  • Similarly, A[1] is address of row 1
  • Value of element 0 of row 1, A[1][0], can be accessed using *(A[1]+0) , ie., A[1]
  • value of element 1 of row 1, A[1][1], can be accessed using *(A[1]+1)
*/ So, value of c'th element on r'th row , value can be 
   looked up using pointer as below.*/
   *(A[r]+c);    ... (i) 

/* Further, element A[r] can be indexed using an offset from A[0] as A[0]+r, 
   so A[r] value can be accessed using *(A+r).
   substituting in (i), we can look up value A[r][c] using*/
    *(*(A+r)+c);   ... (ii)

Summarizing above :

  • De-reference of A  (ie., *A) gives address of first element of ROW 0 or A[0]. A[0] is a pointer to an integer
  • De-reference of A[0]  gives first entry of A or A[0][0]. ie., **A = A[0][0]
  • In general , we can look up value at column ‘c’, of row ‘r’ using  *(*(A+r)+c))

Do read up further about array operation using pointers if you are interested.

Static and Dynamic arrays

Arrays can also be further split into two types : fixed size arrays  and dynamic arrays that can be resized. In Cpp we can use vectors in case we need dynamic arrays.


Application

If you are designing something and you have a checklist that has requirements below :

  • Efficient element look up
  • Don’t need to often  rearrange values

Arrays would be a good data structure to use, for they support O(1) look-ups.

One quick use case I can think of is to hold look up tables of fixed size which are containers for some constants to be used in computations – arrays are your BAE ! 😀

Also, check adjacency matrix required here.


Fact , Did you know?

When you access an array element of invalid index, C/C++ compilers don’t complaint. The behavior is “undefined”. They may look up some random location, if the address is in valid data region of the memory and return some junk value.  Oh well ! Talk about silly coding mistakes, undefined compiler behaviors – resulting in hard bugs to debug in production, sigh!

C++ supports std::vector which is a template class. It supports dynamic resizing of arrays. This class supports at() member function for an element lookup, which shall perform guaranteed bound check. But, it also supports operator [] to lookup elements, that is retained same as in C arrays and performs NO BOUND check.

/* declare a vector type and add only two elements */
std::vector<int> arr;
arr.push_back(1);
arr.push_back(2);

/* Try to access out of bound index 
   only valid access indices are 0 and 1 */
std::cout<<arr.at(2)<<std::endl; // error thrown
std::cout<<arr[2]<<std::endl; // NO ERROR

Related Reading

I am adding some links to my previous blogs as well as other material below for further reading.  Do check them out 🙂

  1. Interaction of program with memory
  2. Data structures for graph
  3. Char arrays
  4. Arraylists in Java 
  5. Solve programming problems related to Arrays

Until next time – keep decoding !

If this was helpful or you have some feedback, do write to me at decodergirlblog@gmail.com 🙂 OR :


Did you know?

More coffee you donate, higher is your coder karma! Oh! there shall also be more blogs published here. Thank you for being awesome! 😀

$1.00

Tuples in Python

This blog assumes you know about Python “lists“.

Today’s tidbit is about Tuples in Python. Why? : Because tuples are special. They are immutable but their values can change !  Wondering what does this mean? Then, read on.


What are tuples?

Tuples refer to a sequence of immutable Python objects. They are similar to “lists” except the objects are not mutable. Tuples use parentheses, lists use square braces.

Tuples can store heterogeneous data together. For example : if you want to share a ip address of a host or domain name along with port for access

machine1=(“10.1.1.2”, ’80’);

More details here , under server_address parameter.

You use tuples when you don’t want to edit the values once defined, as in this case.

If you want to hold list of heterogeneous items that need to be modified, you would use a list.


Immutable Property

To get a good understanding, I would suggest that you check this : Data model in Python 3. I am adding an example code snippet, so it gives you a clear picture what it means to be “immutable” in Python.

Since tuples are immutable objects in Python, when you have a tuple1+=(4,) kind of operation – it essentially means a new object shall be created. This is demonstrated by the difference in the ids of the first tuple1 instance and tuple1  post the addition of  new elements : as marked in RED below. Whereas, with a list, even after addition of new elements, the id remains the same- because they are “mutable” data structure in Python. 

tuple_test.png
Snapshot 1. Immutable property of Tuples

Immutable Tuples and mutable objects

Quoting from the data model page :

The value of an immutable container object that contains a reference to a mutable object can change when the latter’s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value, it is more subtle.) 

Essentially, when tuples hold a “list” object, Tuples are the immutable containers to mutable object whose value can change. This means values contained in the Tuple cannot change, while the data contained in the list object can still change.

An example would most definitely add clarity. Here goes an example :

Marked with a red line is the list that is initialized and the one that is held as one of the objects in the Tuple  (container) tuple_with_list. Then, marked in yellow are the modifications to the list, addition of two new values – 4 and 5. So, essentially , a mutable object’s value can be updated even when contained within an immutable container (Tuple). This, as expected updates the value of the tuple.  And, as highlighted with a blue line, the id of the Tuple will remain the same before and after updates to the list object.

t_w_l.png
Snapshot 2. Tuple with a List object

An object’s mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable.


Are tuples better than Lists?

What do you think ? Do let me know in the comments below. 😉

I will add my take soon. Do check back. Until then , keep decoding! 🙂

Sets in Python

Today’s tidbit blog is about #sets – Sets data structure in Python. Let’s cover what sets refer to, why we need them, how are they used. There’s this really nice course on Python in case you are interested,  on edx : here.


What are sets in python?

Sets data structure in python very much refer to the set theory sets you learn about in high school math. You can do operations such as union, intersection, difference and issubset.

One question I had when I explored sets data structure was how is it different from  lists. What do they help with?

They are like lists but contain no duplicates and are unordered. Hence, any application where you only care about holding  a list of elements for fast look-up and with no duplicates, “sets” could be used – example usage- to extract all unique words from a document. The elements that sets hold are immutable.


Operations

Common set operations and the associated cost on set A with ‘m’ elements and set B with ‘n’ elements would be:

  1. union: O(m+n)
  2. intersection: O(min(m,n))
  3. difference: O(m)
  4. issubset: O(m)

Internals of Sets

Sets in python are implemented like dictionaries except the values are set to NULL – hence the fast look-up – O(1) .

Where as with lists, they are internally operate like re-sizable arrays – also support indexed lookup (like arrays do with a[1], a[2] etc). But, for that we will need to know the index of  the element.

With sets the name of the element serves as an index for lookup, a O(1) average case cost for membership lookup.

Source code for internal implementation of sets is available here. Some of the applications of the sets would be for membership validations, removal of duplicates from sequence and mathematical operations such as intersection,union,difference and symmetric difference (Symmetric difference between two sets includes elements present only in either – essentially excludes intersection – get all unique elements after merge of two sets).


“Frozen” Sets

Sets can hold immutable objects : meaning the objects inside a set cannot be edited. Sets cannot hold lists as an object/entry in the set. However, the set by itself is mutable.

Frozenset is implementation of sets which are not mutable. Once initialized cannot be edited. We could use them in situations where we want look up data which need not be updated once initialized. If we compare frozenset to tuples, frozenset still have property of sets – that they don’t hold duplicates (where as TUPLES can hold DUPLICATES).

Example to clarify above:

[1] Sets with lists as object (mutable objects) : not allowed

names = set(([“Bob”,”Alice”], [“Tom”, “Henry”, “Sam”]))

[2] Valid mutable set : allowed

names = set([“Bob”, “Alice”,”Tom”])

names.add(“Henry”)  <<–(this operation is valid on set)

[3] Frozen sets : not mutable

names = frozenset([“Bob”, “Alice”,”Tom”])

names.add(“Henry”)  <<–(this operation is NOT valid on frozen set)


That’s all I wanted to talk about in this post. Until next time – keep decoding ~

Graph Basics

graph_basics2

In this blog we will try to touch upon the basics of graph that one needs to know. Starting from what are edges or vertices, types of graphs there are – to – what are the data structures available to represent graphs in computer memory. So, let’s get started ! Continue reading “Graph Basics”