Today’s random blog is about #sets – Sets data structure in Python. Let’s cover what sets refer to, why we need them, how are they used. There’s this really nice course on Python in case you are interested,  on edx : here.


What are sets in python?

Sets data structure in python very much refer to the set theory sets you learn about in high school math. You can do operations such as union, intersection, difference and issubset.

One question I had when I explored sets data structure was how is it different from  lists. What do they help with?

They are like lists but contain no duplicates and are unordered. Hence, any application where you only care about holding  a list of elements for fast look-up and with no duplicates, “sets” could be used – example usage- to extract all unique words from a document. The elements that sets hold are immutable.


Operations

Common set operations and the associated cost on set A with ‘m’ elements and set B with ‘n’ elements would be:

  1. union: O(m+n)
  2. intersection: O(min(m,n))
  3. difference: O(m)
  4. issubset: O(m)

Internals of Sets

Sets in python are implemented like dictionaries except the values are set to NULL – hence the fast look-up – O(1) .

Where as with lists, they are internally operate like re-sizable arrays – also support indexed lookup (like arrays do with a[1], a[2] etc). But, for that we will need to know the index of  the element.

With sets the name of the element serves as an index for lookup, a O(1) average case cost for membership lookup.

Source code for internal implementation of sets is available here. Some of the applications of the sets would be for membership validations, removal of duplicates from sequence and mathematical operations such as intersection,union,difference and symmetric difference (Symmetric difference between two sets includes elements present only in either – essentially excludes intersection – get all unique elements after merge of two sets).


“Frozen” Sets

Sets can hold immutable objects : meaning the objects inside a set cannot be edited. Sets cannot hold lists as an object/entry in the set. However, the set by itself is mutable.

Frozenset is implementation of sets which are not mutable. Once initialized cannot be edited. We could use them in situations where we want look up data which need not be updated once initialized. If we compare frozenset to tuples, frozenset still have property of sets – that they don’t hold duplicates (where as TUPLES can hold DUPLICATES).

Example to clarify above:

[1] Sets with lists as object (mutable objects) : not allowed

names = set(([“Bob”,”Alice”], [“Tom”, “Henry”, “Sam”]))

[2] Valid mutable set : allowed

names = set([“Bob”, “Alice”,”Tom”])

names.add(“Henry”)  <<–(this operation is valid on set)

[3] Frozen sets : not mutable

names = frozenset([“Bob”, “Alice”,”Tom”])

names.add(“Henry”)  <<–(this operation is NOT valid on frozen set)


That’s all I wanted to talk about in this post. Until next time – keep decoding ~