Python collections

  • Collections module consists of advanced data structures to handle the data in an efficient manner.
  • It providers alternative data structure to python built-in data types like dict, list, set, and tuple.

namedtuple

  • Named tuples are basically easy-to-create, lightweight object types.
  • Named tuple instances can be referenced using object-like variable dereferencing or the standard tuple syntax.
  • They can be used similarly to struct or other common record types, except that they are immutable.
  • It also improves the readability
  • Let's look at the below example
from collections import namedtuple

Point = namedtuple('Point', 'x y')

p1 = Point(1.0, 5.0)
p2 = Point(2.5, 1.5)
x, y = p1.x, p1.y
print(x, y)
# output: 1.0 5.0
a, b = p2[0], p2[1]
print(a, b)
# output: 2.5 1.5

deque

  • deque - doubly ended queue
  • deque is preferred over a list in the cases where we need quicker append and pop operations from both the ends of the container
  • O(1) - time complexity for append and pop operations for deque.
  • O(n) - time complexity for append and pop operations for list

deque - syntax

from collections import deque

queue = deque(['apple','banana', 500])
print(queue)
# output: deque(['apple', 'banana', 500])

deque - add items

  • append() - method adds the item to the right end of the deque.
  • appendleft() - method adds the item to the left end of the deque.
  • insert(i, item) - method adds the item at index position i
  • extend(iterable) - method adds the multiple elements to the right end of the deque.
  • extendleft(iterable) - method adds the multiple elements to the left end of the deque.
from collections import deque

queue = deque(['apple','banana'])

left_fruit = "Watermelon"
right_fruit = "Orange"
# append left
queue.appendleft(left_fruit)
print(queue)
# output: deque(['Watermelon', 'apple', 'banana'])

# append right
queue.append(right_fruit)
print(queue)
# output: deque(['Watermelon', 'apple', 'banana', 'Orange'])

# insert at index 2
queue.insert(2, "Grapes")
print(queue)
# output: deque(['Watermelon', 'apple', 'Grapes', 'banana', 'Orange'])

# add multiple elements to the right end
queue = deque([1,2,3])
queue.extend((4,5,6))
print(queue)
# output: deque([1, 2, 3, 4, 5, 6])

# add multiple elements to the left end
queue = deque([1,2,3])
queue.extendleft((4,5,6))
print(queue)
# output: deque([6, 5, 4, 1, 2, 3])

deque - remove items

  • pop() - method removes the item to the right end of the deque.
  • popleft() - method removes the item to the left end of the deque.
  • remove(item) - method removes the first occurence of the item
from collections import deque

queue = deque(['Watermelon', 'apple', 'banana', 'Orange'])

# remove item from left
queue.popleft()
print(queue)
# output: deque(['apple', 'banana', 'Orange'])

# remove item from right
queue.pop()
print(queue)
# output: deque(['apple', 'banana'])

# remove given item
cities = deque(['Hyderabad', 'London', 'Singapore', 'Tokyo'])
item = "Hyderabad"
cities.remove(item)
print(cities)
# output: deque(['London', 'Singapore', 'Tokyo'])

deque - access items

  • index(item, start, end) - method returns the first index of the item
    • item - item to search
    • start - start index
    • end - end index
  • count(item) - method returns number of occurences of the item
  • reverse() - method used to reverse the order of the elements.
  • rotate(increment) - rotates the deque by the number specified in arguments. If the number specified is negative, rotation occurs to the left.
from collections import deque

queue = deque([1,2,3,4,5,6,4,2,6])

# get index of an element
print(queue.index(4))
# output: 3
print(queue.index(4, 4, 8))
# output: 6

# count occurences
print(queue.count(4))
# output: 2

# reverse deque
queue.reverse()
print(queue)
# output: deque([6, 2, 4, 6, 5, 4, 3, 2, 1])

# rotate the deque
queue.rotate(2)
print(queue)
# output: deque([2, 1, 6, 2, 4, 6, 5, 4, 3])

ChainMap

  • ChainMap combines multiple dictionaries into one.
from collections import ChainMap

fruites = {"banana": 100, "apple": 200}
vegetables = {"Brinjol": 50, "Cauliflower": 40}

grocery = ChainMap(fruites, vegetables)

print(grocery)
# output: ChainMap({'banana': 100, 'apple': 200}, {'Brinjol': 50, 'Cauliflower': 40})

ChainMap - new_child(item)

  • Add a new dict to existing ChainMap
from collections import ChainMap

fruites = {"banana": 100, "apple": 200}
vegetables = {"Brinjol": 50, "Cauliflower": 40}

grocery = ChainMap(fruites, vegetables)

bakery = {"Cake": 10, "Bun": 15}
# add "bakery" to grocery chainmap
new_grocery = grocery.new_child(bakery)
print(new_grocery)
# output: ChainMap({'Cake': 10, 'Bun': 15}, {'banana': 100, 'apple': 200}, {'Brinjol': 50, 'Cauliflower': 40})
print(grocery)
# output: ChainMap({'banana': 100, 'apple': 200}, {'Brinjol': 50, 'Cauliflower': 40})

ChainMap - parents

  • It returns the all dicts inside the ChainMap except the first one.
from collections import ChainMap

a = {"name": "John"}
b = {"name": "Anji"}
c = {"name": "Chunn"}

cmap = ChainMap(a, b, c)

print(cmap.parents)
# output: ChainMap({'name': 'Anji'}, {'name': 'Chunn'})

ChainMap - maps

  • It returns the list all maps in ChainMap
from collections import ChainMap

fruites = {"name": "Mango"}
vegtables = {"name": "Califlower"}

chain_map = ChainMap(fruites, vegtables)
print(chain_map.maps)
# output: [{'name': 'Mango'}, {'name': 'Califlower'}]

Counter

  • A Counter is a dict subclass for counting hashable objects.
  • It is a collection where elements are stored as dictionary keys and their counts are stored as dictionary values.
  • Counts are allowed to be any integer value including zero or negative counts.

Counter - initialization

from collections import Counter

empty_counter = Counter()
print(empty_counter)
# output: Counter()
counter_from_iterable = Counter('Google')
print(counter_from_iterable)
# output: Counter({'o': 2, 'G': 1, 'g': 1, 'l': 1, 'e': 1})
counter_from_map = Counter({'red': 4, 'blue': 2})
print(counter_from_map)
# output: Counter({'red': 4, 'blue': 2})
counter_from_kwargs = Counter(cats=4, dogs=8)
print(counter_from_kwargs)
# output: Counter({'dogs': 8, 'cats': 4})

Counter - elements()

  • Get all elements from the Counter object.
from collections import Counter

c = Counter(a=4, b=2, c=0, d=-2)
elements = c.elements()
print(elements)
# output: <itertools.chain object at 0x1047e7250>
print(list(elements))
# output: ['a', 'a', 'a', 'a', 'b', 'b']

Counter - most_common()

  • Return a list of the n most common elements and their counts from the most common to the least.
  • If n is omitted or None, most_common() returns all elements in the counter.
from collections import Counter

c = Counter('abracadabra')
common = c.most_common(3)
print(common)
# output: [('a', 5), ('b', 2), ('r', 2)]

Counter - subtract()

  • Elements are subtracted from an iterable or from another mapping (or counter).
from collections import Counter

c = Counter(a=4, b=2, c=0, d=-2)
d = Counter(a=1, b=2, c=3, d=4)
c.subtract(d)
print(c)
# output: Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

Counter - total()

  • Compute the sum of the counts.
  • Note: it only works on python version 3.10 or later
from collections import Counter

c = Counter(a=10, b=5, c=0)
print(c.total())
# output: 15

Counter - arithmetic operations

from collections import Counter

a = Counter(a=10, b=5, c=1)
b = Counter(a=7, b=3, x=100)

print(a + b)
# output: Counter({'x': 100, 'a': 17, 'b': 8, 'c': 1})
print(a - b)
# output: Counter({'a': 3, 'b': 2, 'c': 1})
print(a & b)
# output: Counter({'a': 7, 'b': 3})
print(a | b)
# output: Counter({'x': 100, 'a': 10, 'b': 5, 'c': 1})
print(a == b)
# output: False

OrderedDict

  • It's a sub class of dict
  • Unlike dict it maintains the order of the elements inserted.
  • It supports all methods as dict
from collections import OrderedDict

d = OrderedDict()
d["apple"] = 100
d["grapes"] = 200
d["oranges"] = 300

print(d)
# output: OrderedDict([('apple', 100), ('grapes', 200), ('oranges', 300)])

defaultdict

defaultdict - syntax

defaultdict(default_factory)
  • defaultdict dict is a subclass of dict
  • Both, defaultdict and dict works very similar except default dict did not throw KeyError unless default_factory is None
  • default_factory() function returns the default value set by the user.

defaultdict - basic usage

from collections import defaultdict

d = defaultdict(lambda: "key does not exist")
d["name"] = "Anji"
print(d["name"])
# output: Anji
d["name"] = "John"
print(d["name"])
# output: John
print(d["unknown"])
# output: key does not exist

defaultdict - using list as default_factory

from collections import defaultdict

d = defaultdict(list)
names = ["Anji", "Jessie", "Sudha"]

for name in names:
    d["names"].append(name)

print(d)

# output: defaultdict(list, {'names': ['Anji', 'Jessie', 'Sudha']})

UserDict

  • Prior to the introduction of dict, the UserDict class was used to create dictionary-like sub-classes that obtained new behaviors by overriding existing methods or adding new ones.
  • It has no added advantage over dict

UserList

  • Python supports a List like a container called UserList present in the collections module. This class acts as a wrapper class around the List objects.
  • It has no added advantage over list

UserString

  • The class, UserString acts as a wrapper around string objects.
  • The need for this class has been partially supplanted by the ability to subclass directly from str;
  • however, this class can be easier to work with because the underlying string is accessible as an attribute.
  • It has no added advantage over string

References: