Byte Ordering

What is Endianness?

Endianness or Byte ordering refers to two different ways of storing multi-byte data in memory by the processor. In the embedded world where the data memory is shared and there is some hardware writing data to this block for some other piece of hardware to consume, it becomes important to understand and agree upon the endianness.

Network and Host Byte order

Yes, same applies to over the network data transfer protocols, in case you are familiar with them. The network uses big endian format while the end device may be little or big endian – hence we have the htons() and ntohs() socket level apis made available for usage. htons refers to “host to network” while ntohs refers to “network to host”  byte order conversions.


Examples

Consider a 16-bit integer in hex format 0x1234. This is represented in a “little endian” system with lower order byte first “34”. And,is represented in a “big endian” system  with higher order byte “12” first as summarized in the table below.

Address Little endian Big endian 
0x4000 34 12
0x4001 12 34

Example 32-bit data : 0xABCD1234

Address Little endian Big endian 
0x4000 34 AB
0x4001 12 CD
0x4002 CD 12
0x4003 AB 34

Detecting  the Byte order

Now that we know what big and little endian is, let’s see how we could potentially identify the endianess of a system.

Let’s assume we have a integer variable with value set to 1 – represented on a 16-bit system as 0x0001. If the first byte is 1 (that is data at lower address) then, it a Little endian system else it is a Big endian system. To check if the system is little endian we could write the sample program below.

boolean isLittleEndian()

{

int i = 0x0001;  // 16-bit variable

char* a = (char*) &i; // size of char value is a byte (8 bits)

return (*a)?1:0; // terenary operator (condition?value_if_true value_if_false)

}

If above returns true – the system is little endian else the system is big endian.


Conversion

As I had briefly mentioned above, the socket libraries support api’s for conversions such as htons() and ntohs(). Now, let’s see how we could implement our own functions for this.

htons() is used to convert from host to network order [hton] for short variable [s] : that is little endian to big endian.

 As seen in the example below, all we have to do is swap the bytes passed. Do check the post about bit level operations to understand the trick used here.

Address Little endian Big endian 
0x4000 34 12
0x4001 12 34

uint16 htons(uint16 input)

{

//reuse the endian check function

if(isLittleEndian())

{

//host is little endian – conversion needed

//use bit manipulation tricks to swap the bytes in-place as below

return ((input&0xFF00>>8)|(input&0x00FF<<8));

}

else

{

//this means it is in big endian format

// no conversion needed

return input;

}

}


That’s all in this tidbit.

Until next tidbit, keep decoding ! Happy programming !

Sets in Python

Today’s tidbit blog is about #sets – Sets data structure in Python. Let’s cover what sets refer to, why we need them, how are they used. There’s this really nice course on Python in case you are interested,  on edx : here.


What are sets in python?

Sets data structure in python very much refer to the set theory sets you learn about in high school math. You can do operations such as union, intersection, difference and issubset.

One question I had when I explored sets data structure was how is it different from  lists. What do they help with?

They are like lists but contain no duplicates and are unordered. Hence, any application where you only care about holding  a list of elements for fast look-up and with no duplicates, “sets” could be used – example usage- to extract all unique words from a document. The elements that sets hold are immutable.


Operations

Common set operations and the associated cost on set A with ‘m’ elements and set B with ‘n’ elements would be:

  1. union: O(m+n)
  2. intersection: O(min(m,n))
  3. difference: O(m)
  4. issubset: O(m)

Internals of Sets

Sets in python are implemented like dictionaries except the values are set to NULL – hence the fast look-up – O(1) .

Where as with lists, they are internally operate like re-sizable arrays – also support indexed lookup (like arrays do with a[1], a[2] etc). But, for that we will need to know the index of  the element.

With sets the name of the element serves as an index for lookup, a O(1) average case cost for membership lookup.

Source code for internal implementation of sets is available here. Some of the applications of the sets would be for membership validations, removal of duplicates from sequence and mathematical operations such as intersection,union,difference and symmetric difference (Symmetric difference between two sets includes elements present only in either – essentially excludes intersection – get all unique elements after merge of two sets).


“Frozen” Sets

Sets can hold immutable objects : meaning the objects inside a set cannot be edited. Sets cannot hold lists as an object/entry in the set. However, the set by itself is mutable.

Frozenset is implementation of sets which are not mutable. Once initialized cannot be edited. We could use them in situations where we want look up data which need not be updated once initialized. If we compare frozenset to tuples, frozenset still have property of sets – that they don’t hold duplicates (where as TUPLES can hold DUPLICATES).

Example to clarify above:

[1] Sets with lists as object (mutable objects) : not allowed

names = set(([“Bob”,”Alice”], [“Tom”, “Henry”, “Sam”]))

[2] Valid mutable set : allowed

names = set([“Bob”, “Alice”,”Tom”])

names.add(“Henry”)  <<–(this operation is valid on set)

[3] Frozen sets : not mutable

names = frozenset([“Bob”, “Alice”,”Tom”])

names.add(“Henry”)  <<–(this operation is NOT valid on frozen set)


That’s all I wanted to talk about in this post. Until next time – keep decoding ~

Functions and Stack

What is stack?

Stack is part of computer memory in the RAM that is used for memory allocation and management. The advantage of using this type of allocation would be that the memory management is handled for us by the system automatically as opposed to using heap allocations.

We must note that the stack allocated is of limited size and we can easily overwhelm the stack if we don’t take precautions to use the stack efficiently. Specially in an embedded environment, a memory limited system, it is  of utmost importance that we write stack friendly code. If you don’t want to be the reckless coder human-1 in the pic below, let’s dive into the details of what call stack is and how the system manages it for us.

stackof.png

Actual details of the stack depends on the compiler, programming language , operating system and available instruction set. In high level languages, details of the usage of the stack are abstracted from the programmer who has access only to functions and not the memory. However, programmer needs to understand the ways to using stack efficiently.


Call stack in a system

In the previous post on program and memory we had seen that the variables local to function are allocated on the stack memory. System essentially sets aside a scratch space for every function or thread under execution on the stack memory referred to as the call frame or stack frame.The stack space used for this is referred to as the call stack or program stack. The “call stack” essentially holds stack frames of the all the active functions of the system : functions for which return has not been hit.

When a function is called, system performs below sequence of operations :

  1. Take back up of any registers required (ex: EAX, ECX, EDX)
  2. Arguments evaluated to values (parameters passed)
  3. Control flow jumps to body of the function
  4. Code execution begins
  5. Once “return” is encountered, control returns back to the function call

During this process, each stack frame in the call stack is used to hold:

  1. Parameter passed to the function
  2. The return address
  3. Local variables of a function under execution
  4. Evaluation stack in case of complex processing in the function
  5. Pointer to current context
  6. Exception handlers – in cases where try and catch may be used for some exception handling.
st1.png
Fig (1) Stack frame

And above set n Fig(1)  forms a “stack frame” associated with a “function call”. Please note that the items listed need not be in the same order.

System Stack Management

As the data structure we had looked at in older blog , the call stack also supports push and pop operations. It can grow downwards from higher address towards lower address or vice versa depending on the system implementation. The system uses two pointers for stack management :

a. Extended stack pointer (ESP) and

b. Extended Base Stack Pointer (EBP) or the frame pointer

Additionally, EIP is the Extended Instruction pointer, that points to next instruction to be executed in the program memory. This value is copied on to stack – to keep track of the return address.

ESP is like a mutable register shared between all function invocations. ESP always points to the top of the stack – so ESP essentially keeps changing with added function calls.

EBP is maintained by each stack frame. It is set equal to current ESP before allocating the stack frame for any new function call. As the name implies, base pointer is used like a fixed point from which data within stack frame can be accessed using an offset.Once the function execution is complete, the stack pointer that has progressed is set back to EBP in the current stack frame and the memory that was allocated for the function execution is considered available (note that memory is not cleaned per se, so it may contain garbage values from previous allocations). I assume an example would clarify the statements here.


Example stack operation

Let’s consider an example program below. With a gcc compiler on a Linux system, we would see the stack usage depicted in Fig (2) below.

int main() {

int localVar1;

int a=10,b=20;

localVar1 = computeSomething(a,b);

printf(“Result is %d”, localVar1);

return;

}

int computeSomething(int c, int d)

{

int localVar2; // local variable in the function

localVar2 = c;

int result = (c+d)-localVar2;

//or perform some other operation on arguments c ,d  and return

return result;

}

Stack allocation associated with the code block above is depicted in the left part, while corresponding stack de-allocation is depicted on the right. Blocks colored light grey are part of caller’s stack frame while the blocks colored in yellow are part of callee’s stack frame.

PART I : ALLOCATION

(also called stack management : procedure prolog)

[1] Assume the green line at the bottom is the current stack pointer location when main() starts execution. In the example below main() is the caller function and computeSomething() is the callee.

[2] When main() invokes computeSomething(), new stack frame is to be allocated for the function execution. The parameters passed to the function are ‘pushed’ on to stack from the last to first argument, to begin with. It is also responsibility of ‘caller’ to take a back up of the registers that may be edited by callee execution- namely EAX, EDX and ECX.

[3] Also, before starting execution of the function, the next instruction to be executed (EIP) at that point is saved as return address on the stack.

[4] EBP in computeSomething()’s stack frame is set to be main()’s EBP. Then, the execution scope shifts to computeSomething(). Stack frame contents like the local variables, arguments passed to function are accessed using an address offset from the EBP in the corresponding frame. i.e., EBP is fixed while ESP keeps moving to point at the top of the stack, remember?

[5]Before starting execution of computeSomething(), a back-up of registers that callee is responsible for, is taken and stored in it’s stack frame- namely EBX, ESI and EDI.

[6] Once in the callee scope, local variables are allocated, there may or may not be temporary storage used – say some complicated expressions are to be evaluated.

[7] If the return value of the callee is greater than 4 bytes, an extra variable pointer is passed to the function from the caller.

Ex: computeSomething(&extra,a,b) (instead of computeSomething(a,b).

st3.png
Fig (2) Stack allocation and de-allocation with function calls

PART II : DE-ALLOCATION

(also called stack management : procedure epilog)

[1] Once computeSomething()’s execution is complete and return is hit, the stack pointer- ESP is set to EBP of the frame- memory is not freed as such- so will contain garbage.  The registers (EBX, ESI, EDI) are reverted back.

[2] The execution comes back to caller’s context () – return address points to the next instruction to be executed after the function. Registers EAX,EDX and ECX are reverted to what they were before function call.

[3]If return is hit in the main() as well, then ESP is reset to EBP in the stack frame and the ESP returns to where it started – the GREEN line where stack pointer was when we started execution of main().

Please note that stack cannot grow infinitely. Stack is limited memory.

And with that, we are done analyzing how stack can grow or shrink with function calls and execution!


Stack overflow

Now that we have looked at the operation of call stack, it would be useful to look at how we could overwhelm it through our code (depicted in Fig(3) below). If stack pointer crosses the stack size limit marked with a red line- we hit the “stack overflow” issue. Example scenarios how this can happen are listed  below:

  • Trying to allocate data structure more than the allocated stack size.
  • Recursive function with no exit clause – stack frames are allocated for every instance of recursion and eventually exceeds stack size.
  • Infinite loop with some memory allocation within. In this case the local variable allocations add to stack frame and eventually exceed stack limit (not shown below).
st4.png
Fig(3) Stack overflow scenarios

Comparing with Heap allocations

In multi-thread environment, each thread is given a stack while all threads share heap space. Stack allocation and de-allocation are considered inexpensive as compared to allocations on the heap. This is because, with stack allocation and de-allocation is achieved by just moving a pointer , by virtue of LIFO operation of the stack data structure and requirements of memory management between function calls.  Whereas, with heap the allocation and de-allocation do not follow a given pattern and require book keeping and much more than just moving pointer to adjust.


Above is true for stack based programming that utilizes system stack for operations. So, if you are thinking , can there be stack-less implementations ? check this : StacklessPython

In any case the key to efficient programming is to essentially understand how system operates internally, which is what we tried today. Happy coding!