Arrays

Data structures Series  – Blog 01

Hi Guys! It’s September and as promised in my last post, a data structure blog is ready and waiting to be read.*** Drum-rolls please ***  Here’s my first blog in the Data Structure series. 😉


Data Structures

Data structures refer to data representation or data organization for holding program data in the computer memory aiding with the processing or operations to be performed on the data.


Arrays

Arrays are basic data structures in which :

a. All values held by a given array are of the same type – homogeneous

b. Values are allocated contiguous locations in computer memory


Arrays in Memory

I believe it may be a good idea for a programmer to know about behind the scenes of the code written, for efficient programming – that is how code interacts with Memory.  This block is mainly intended at visualizing how arrays are stored in computer memory.

Declaration and usage of arrays need us to know the element_type , name_for_array and array_size :

element_type indicates the data_type (such as int , char, float etc) of the element_value that shall be stored in a given array.

name_of_array – serves as constant pointer to first element of the array when we need to look up or change element_values in the array.

array_size shall indicate the number of elements held by the given array.

Fig 1 shows the side by side view of memory representation of two arrays A and T representing a 1-D and a multidimensional array respectively.

Real world data representation shows how we visualize the data that is to be stored. Declaration shows the code used to declare an array to hold this data and “corresponding memory layout” shows how it is actually stored in computer memory.  In case of 1-D arrays , this is pretty straight forward, how you see is also how  the computer represents data. However, in case of multi-dimensional arrays on the right side , note that how we see data and how it is stored varies. Data is stored row wise , each row  translated to a 1-D representation starting with 1-D array for elements of row 0 {10,20,30}, followed by row 1 {40,50,60} and so on.

For multi-dimensional arrays can be stored as row-major (as depicted) or column major. In case of column major, each column shall be represented like a 1-D array at a time, instead of row.

arr_addressing.png
Fig 1. Memory Layout of arrays

Now that we have an idea of how arrays are represented in the memory, we can take a look at operations on arrays next.


Array operations

1-D Arrays

  • Various supported declarations and initialization of arrays in C/C++ is as shown below.
/* Global arrays that are initialized without size being specified */ 
int c[]={2,3,4};

/* Global arrays without explicit initialization are 
   set to zero by default */
int d[10];

//local declarations of arrays
int main(){
 /* Array size variables must be declared as constants */ 
    const int a_size = 5, b_size = 3, c_size= 3;

 /* array 'a' is initialized with element values */ 
    int a[a_size] = {3,4,5,1,6}; 

 /* array 'b' is initialized without any values using {}   
    all elements shall be by default initialized to zero*/ 
    int b[b_size]={};
}
  • Array elements are referenced by an index. The value of index ranges from 0 to n-1 for an array of size ‘n’. Highlighted in red shows how an integer array can be passed to a function either as an integer pointer or array with dimension or no dimension specified as shown below.
/* array as function param can be passed as a pointer */
void print_arr(int* arr, int n){

/* Given array size, we can look up array elements using index ,    
in this case i , that takes values from 0 to n-1*/ 
for (int i=0; i<n;i++){
 std::cout<<arr[i]<<std::endl; 
}
}

/* another way to pass array to functions 
   using dimension-less array representation */
void print_arr(int arr[], int n){
}

/* array can also be passed as a sized array */
void print_arr(int arr[5], int n){
// Note: in case you pass some other dimension array, 
   no bound check is performed. So, no error reported */
// check the section below under "Fact, Did you know" :)
}

Multi-dimensional Arrays

/* declaring multi-dimensional array of size 2x2 */
/* each row is like a 1-D array, here 2 rows are 
   declared one after the other*/
int arr[2][2]={{3,4}{5,6}};

Assigning values to various indices and look up of element value at a given index is O(1) operation.

However, if you need to rearrange or move around data, it can get expensive. For example, as shown in Fig 2, if we want to move the last element to start of the array of size ‘n’, we will need n-1 copy operations to move the elements downwards and then place the element to the beginning of the array. We can clearly see this can get expensive when the number of elements get higher and if this operation is to be performed frequently.

Also, in case of arrays, another problem is static allocation. If we do not have an idea of  how much space is to be allocated, we may end up allocating more than required and wasting space or less than required – in which case we would need reallocation.

There is another data structure that takes care of these problems – namely “linked lists” (coming up next week ! 🙂 )

arr_expensive.png
Fig 2. operations on arrays to rearrange data

Relation of arrays and pointers

If you are working on cpp or Java, worry not my friend ! 😉 it’s all taken care of, for you. While C programmers, take some time to wrap your head around this. It is simple once you understand ! 🙂

// quick pointer refresher 
For a given pointer : int* ptr ;
ptr is a pointer to an integer data

ptr holds the address 
*ptr gives the value at the address held by ptr // de-reference

// for pointer to pointer case 
int** p_to_p;

*p_to_p holds the address
*(*p_to_p) gives value at address // de-reference

Let’s consider a 2-D array :

int a[m][n];
//2D array is an array of 1-D arrays per row
// 1D array name is nothing but a pointer to an integer
//2D array name is hence a pointer to pointer to an integer
  • A[0] is address of row 0
  • Value of element in column 0 of row 0, A[0][0], can be accessed using *(A[0]+0) , ie., A[0]
  • element 1 of row 0, A[0][1] is accessed using *(A[0]+1)
  • Similarly, A[1] is address of row 1
  • Value of element 0 of row 1, A[1][0], can be accessed using *(A[1]+0) , ie., A[1]
  • value of element 1 of row 1, A[1][1], can be accessed using *(A[1]+1)
*/ So, value of c'th element on r'th row , value can be 
   looked up using pointer as below.*/
   *(A[r]+c);    ... (i) 

/* Further, element A[r] can be indexed using an offset from A[0] as A[0]+r, 
   so A[r] value can be accessed using *(A+r).
   substituting in (i), we can look up value A[r][c] using*/
    *(*(A+r)+c);   ... (ii)

Summarizing above :

  • De-reference of A  (ie., *A) gives address of first element of ROW 0 or A[0]. A[0] is a pointer to an integer
  • De-reference of A[0]  gives first entry of A or A[0][0]. ie., **A = A[0][0]
  • In general , we can look up value at column ‘c’, of row ‘r’ using  *(*(A+r)+c))

Do read up further about array operation using pointers if you are interested.

Static and Dynamic arrays

Arrays can also be further split into two types : fixed size arrays  and dynamic arrays that can be resized. In Cpp we can use vectors in case we need dynamic arrays.


Application

If you are designing something and you have a checklist that has requirements below :

  • Efficient element look up
  • Don’t need to often  rearrange values

Arrays would be a good data structure to use, for they support O(1) look-ups.

One quick use case I can think of is to hold look up tables of fixed size which are containers for some constants to be used in computations – arrays are your BAE ! 😀

Also, check adjacency matrix required here.


Fact , Did you know?

When you access an array element of invalid index, C/C++ compilers don’t complaint. The behavior is “undefined”. They may look up some random location, if the address is in valid data region of the memory and return some junk value.  Oh well ! Talk about silly coding mistakes, undefined compiler behaviors – resulting in hard bugs to debug in production, sigh!

C++ supports std::vector which is a template class. It supports dynamic resizing of arrays. This class supports at() member function for an element lookup, which shall perform guaranteed bound check. But, it also supports operator [] to lookup elements, that is retained same as in C arrays and performs NO BOUND check.

/* declare a vector type and add only two elements */
std::vector<int> arr;
arr.push_back(1);
arr.push_back(2);

/* Try to access out of bound index 
   only valid access indices are 0 and 1 */
std::cout<<arr.at(2)<<std::endl; // error thrown
std::cout<<arr[2]<<std::endl; // NO ERROR

Related Reading

I am adding some links to my previous blogs as well as other material below for further reading.  Do check them out 🙂

  1. Interaction of program with memory
  2. Data structures for graph
  3. Char arrays
  4. Arraylists in Java 
  5. Solve programming problems related to Arrays

Until next time – keep decoding !

If this was helpful or you have some feedback, do write to me at decodergirlblog@gmail.com 🙂 OR :


Did you know?

More coffee you donate, higher is your coder karma! Oh! there shall also be more blogs published here. Thank you for being awesome! 😀

$1.00

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s