11  Data types

11.1 Introduction

Data type refers to object types that store information and provide some operations on that information.

Compare a function and a number, both are objects but a function stores code and numbers stores numeric data primarily. That is why, function, class etc. are not referred to as data types like int or float which are numeric data types.

Data types are the most critical part of any language. They are used to store, access and operate upon information within code.

Numbers and text are the most fundamental data types.

Some languages like C, distinguish between characters and strings, where strings are treated as sequence of characters. Python has just strings for text, which are a sequence of characters, and can be a sequence of single character.

Then there are collections which provide ways to combine objects to create more complex data. Data types like list, dictionary etc. are provided in higher level languages. Every high level language has its own implementations and syntax with differences, but underlying design principles are the same.

Data Structures and Algorithms part of the book gives a high level background of how the data designs have evolved.

11.1.1 Overview

Below tree provides an overview of data types implemented in Python with categories.

  • The nodes with red text are the data types which are builtin in Python and are covered in the book
  • The nodes with blue text are data types available by loading from standard library

Standard library is discussed in architecture part of the book: Section 18.2

Collection is generally used for collection of objects, and it can be compounded, collection of collection of objects.

Sequence type is a collection of objects with preserved order. In base Python, strings, tuples and lists are sequence type.

Sequences can be mutable or immutable.

Iterable is any collection of objects from which objects can be retrieved one at a time and hence can be looped through. str, tuple, list, dict, set are all iterables.

Sequence is a subset of collections. All sequences are iterables.

Methods available in any sequence type can be categorized as below.

  • common methods for sequence types
    • sequence level operations (e.g. length of a sequence)
    • element level operations (e.g. indexing and slicing)
    • methods available for iterables (can be used in loops)
  • if mutable then methods for mutable sequence types
  • methods specific to a sequence type

Iterable methods are related to use in looping and are discussed in respective sections.

Based on this categorization, methods are discussed for respective sequence types, string, list and tuple.

All this information is end product of developments in the field of data structures. The DSA part of the book has more information on this, post reading which context of why Python data types behave the way they do will be more clear.

11.1.2 Objectives

Focus on understanding the underlying concepts and where to look for information in a structured way.

  • Creation and syntax
  • Operations: understand how operations are structured across data types
  • Indexing and slicing for sequence types
  • Implications: understand the usage and implications using example use cases given
    • Mutable vs Immutable data types
  • Choosing data types: understand when to use which data type

11.2 None

None signifies absence of value. A variable might be defined but not bound to any object. None is a placeholder to signify this state.

It is specially useful in conditionals, to avoid error when checking if a value has been assigned to a variable. This is covered in chapter on conditionals (Section 11.10.1.2.3, Section 15.1.5).

11.2.1 Example

some_var = None
print(f'{some_var=}, {type(some_var)=}')
>>>  some_var=None, type(some_var)=<class 'NoneType'>

11.3 Numeric types

11.3.1 Summary

Boolean Integers Rationals Real Complex
bool int fractions.Fraction float decimal.Decimal complex
  • numeric data types are immutable
  • for basic calculations int and float are sufficient
    • others are listed for completeness
  • boolean and comparison operations are discussed separately

Immutable implies that once an object is created, its value cannot be modified.

For variable assignment this implies that if a variable is storing some number and is assigned another number, a new object is created in background. This does not have any significant impact in case of numeric data types.

Mutability is discussed in more detail at the end of this chapter.

11.3.2 Specifications

Numbers, integers or floats, can be typed as done in regular math. There are some special syntax available for code readability.


Underscores can be used for better code readability. During execution they are treated normally.

one_million_int = 1_000_000
one_million_float = 1_000_000.00
print(f'{one_million_int = }')
>>>  one_million_int = 1000000
print(f'{one_million_float = }')
>>>  one_million_float = 1000000.0

Scientific notation can be used with floats. e has to be preceded by a number.

x = [1e-2, 3.314e+5]
print(x)
>>>  [0.01, 331400.0]

11.3.3 Operations

Regular math operations can done using the symbols provided as listed below. Other functions commonly used are

  • round(x[, n]) is a builtin function provided
  • Standard library has more options for numeric operations
addition substraction multiplication division exponents floored division modulo
+ - * / ** // %
  • using division always returns float
  • operations with int and float return float

11.3.3.1 Increment/Decrement

Incrementing and decrementing a value is provided through operators += and -=.

  • x += n is same as x = x + n
  • x -= n is same as x = x - n

Python uses a special syntax for these common operations and can be extended to below operations.

  • x *= n is same as x = x * n
  • x /= n is same as x = x / n
  • x **= n is same as x = x ** n
Caution for float

There are some issues and limitations with floating point arithmetic using float.

It is recommended to go through them at python documentation on limitations of using float type.

11.3.4 Examples

11.3.4.1 Example 1

Below is a basic example of assigning int and float. Note that if decimal is present then, even if number is integer, it is stored as float.

num1 = 10; num2 = 10.0
print(f'{num1 = }, {type(num1) = }')
>>>  num1 = 10, type(num1) = <class 'int'>
print(f'{num2 = }, {type(num2) = }')
>>>  num2 = 10.0, type(num2) = <class 'float'>

11.3.4.2 Example 2

Operations with int and float return float.

num1 = .25; num2 = 100
num3 = num2 * num1
print(f'{num1 = }, {type(num1) = }')
>>>  num1 = 0.25, type(num1) = <class 'float'>
print(f'{num2 = }, {type(num2) = }')
>>>  num2 = 100, type(num2) = <class 'int'>
print(f'{num3 = }, {type(num3) = }')
>>>  num3 = 25.0, type(num3) = <class 'float'>

11.3.4.3 Example 3

Objects of type int, within certain range (-5 to 256), are not duplicated for performance reasons.

some_int_1 = 10; some_int_2 = 10
some_int_1 is some_int_2
>>>  True

The basic idea is to intern for memory optimizations. Sometimes useful for strings, string interning. This causes surprizes such as this example.

11.3.4.4 Example 4

Numeric data types are immutable. In the example below, when some_int is assigned a new value, a new object is created in memory and bound to some_int.

some_int = 10
print(hex(id(some_int)), f'{some_int=}')
>>>  0x75bcb47e0210 some_int=10
some_int += 1
print(hex(id(some_int)), f'{some_int=}')
>>>  0x75bcb47e0230 some_int=11

11.4 String

11.4.1 Overview

In Python, a string (str type object) is an immutable sequence of unicode code points. More generally speaking it is an immutable sequence of characters, numbers and symbols.

11.4.2 Specifications

11.4.2.1 Overview

  • Strings can be created using
    • single quotes: 'some string'
    • double quotes: "some string"
    • multi-line strings
      • triple single quotes: '''some string'''
      • triple double quotes: """some string"""
    • raw strings just need a preceding r character for any method
      • r"string with \", r'string with \'
  • Special string types
    • multiline strings
    • raw strings
    • formatted string literals
  • print changes the way results are displayed

11.4.2.2 Basic strings

string_1 = 'using single quotes'
string_2 = "using double quotes"
string_3 = "including \"double quotes\" using double quotes"
string_4 = 'including "double quotes" using single quotes'

11.4.2.3 Multiline strings

Multiline strings can be created using triple quotes (single/double). A physical new line within a string is not included in the string. The spaces and tabs on a line are included, see string_2 below.

string_1 = """This is a multiline string
with no tabs using triple double quotes"""
string_2 = '''This is a multiline string
              with tabs using triple single quotes'''
print(string_1)
>>>  This is a multiline string
>>>  with no tabs using triple double quotes
print(string_2)
>>>  This is a multiline string
>>>                with tabs using triple single quotes

11.4.2.4 Using backslash (\)

Backslash can be used to insert some special character sequences in a string, which are used by the print and similar functions which can parse such special character sequences.

Examples:

  • newline: \n
  • tabs: \t
  • escape quote symbol: \' or \"

Note in below examples, when variables are output without print function, special character sequences like newline and tab are not parsed and shown as is.

string_1 = "Line 1\nLine 2"
string_2 = "text 1\ttext 2"
string_1; print(string_1)
>>>  'Line 1\nLine 2'
>>>  Line 1
>>>  Line 2
string_2; print(string_2)
>>>  'text 1\ttext 2'
>>>  text 1 text 2

11.4.2.5 Raw strings

Raw strings do not escape backslash (\). To create a raw string prepend string with r or R character.

One typical use case is to store windows path which have backslashes. Note in below example since \u has special meaning it gives error while creating the path which contains such sequence of characters.

string_1 = "C:\user\name"
>>>  (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX escape (<string>, line 1)
string_2 = r"C:\User\name"
print(string_2)
>>>  C:\User\name

11.4.2.6 Formatted strings

Formatted strings are used for mixing hard coded text and variable values with formatting.

  • old syntax
    • "text with {0[:fs]} and {1}".format(var1, var2)
  • new f-string syntax (Python version >= 3.6)
    • f'text with {var1[:fs]} and {var2[:fs]}'

where fs to be read as format specifier

This is specially useful in controlling the format of output message from the code. Message could be an error, warning or a regular informative message.

There are a lot of options to play around which can be found at link.

11.4.2.6.1 Examples
11.4.2.6.1.1 Ex 1

Using old format style.

user_name = "First Last"
user_age = 20
my_string = "Name: {0}\nAge: {1}".format(user_name, user_age)
print(my_string)
>>>  Name: First Last
>>>  Age: 20
11.4.2.6.1.2 Ex 2

Using old format style with format specifier.

user_name = "First Last"
user_age = 20
user_balance = 1000001
my_string = "Name: {0:^30}\nAge: {1:^30}\nBalance: {2:,.2f}".format(\
  user_name, user_age, user_balance)
print(my_string)
>>>  Name:           First Last          
>>>  Age:               20              
>>>  Balance: 1,000,001.00
11.4.2.6.1.3 Ex 3

Using new f-string with format specifier.

user_name = "First Last"
user_age = 20
user_balance = 1000001
my_string = f"Name: {user_name:>15}\nAge: {user_age:>16}\
    \nBalance: {user_balance:>12.2f}"
print(my_string)
>>>  Name:      First Last
>>>  Age:               20    
>>>  Balance:   1000001.00

11.4.3 Operations

Below are the 2 major categories of operations a string supports.

  • common operations on sequence types
  • operations specific to strings

Common sequence operations like indexing and slicing are provided with examples below, but are same for all sequence types like tuple and list.

11.4.3.1 Sequence

  • operations on sequence itself
    • length (len(s))
    • concatenate (s1 + s2), for same type sequences
    • repeat (s*n or n*s) where n is the number of repeats
    • comparisons, for same type sequences
  • operations on items in sequence
    • retrieve by position: index/slice (s[i[, j[, k=1]]])
    • min/max
    • check element’s
      • existence: e in s, e not in s
      • index: s.index(e)
      • count: s.count(e)
11.4.3.1.1 Index & Slice

Indexing refers to retrieving elements by position. Slicing refers to extracting subset of elements of a sequence.

  • indexing starts at 0 and ends at n - 1
  • negative indices are allowed
  • usage
    • s[i]: return item at index i
    • s[i:j]: return items from index i to j-1
      • returns j - i items
    • s[i:j:k]: return items from index i to j-1 with step k
      • k=1 by default

11.4.3.1.2 Examples
string_1 = "abcdefgh"

Enumerate function is used to get pairs of index and elements of a sequence.

print([*enumerate(string_1)])

[(0, ‘a’), (1, ‘b’), (2, ‘c’), (3, ‘d’), (4, ‘e’), (5, ‘f’), (6, ‘g’), (7, ‘h’)]

  • select c to f
string_1[2:6]
>>>  'cdef'
  • select second last - g
string_1[-2]
>>>  'g'

[(0, ‘a’), (1, ‘b’), (2, ‘c’), (3, ‘d’), (4, ‘e’), (5, ‘f’), (6, ‘g’), (7, ‘h’)]

  • select last 3 elements
string_1[-3:]
>>>  'fgh'

[(0, ‘a’), (1, ‘b’), (2, ‘c’), (3, ‘d’), (4, ‘e’), (5, ‘f’), (6, ‘g’), (7, ‘h’)]

  • select d onwards
string_1[3:]
>>>  'defgh'
  • select up till d
string_1[:4]
>>>  'abcd'
string_1[:3] + string_1[3:]
>>>  'abcdefgh'

11.4.3.2 String specific

  • There are a lot of default operations (link)
    • case, find/replace, checks, strip, split, …
    • usually easy to use, look at reference as needed
  • regular expressions
    • searching and matching patterns in strings
    • depends on module re in standard library
    • advanced topic, avoid at this stage

11.4.3.3 Arithmetic Operators

The + and * operations can be used with strings, other operators will give error.

Note that operations work with compatible type of objects.

  • +: concatenate strings
    • works with str type objects, i.e. strings
  • *: repeat a string
    • works with a str and an int
11.4.3.3.1 Example 1
some_str_1 = "some string 1"; some_str_2 = "some string 2"
concat_1_2 = some_str_1 + " " + some_str_2
print(concat_1_2)
>>>  some string 1 some string 2
11.4.3.3.2 Example 2
some_str = "xyz"
print(some_str*5)
>>>  xyzxyzxyzxyzxyz
11.4.3.3.3 Example 3
some_str_1 = "some string 1"; some_str_2 = "some string 2"
concat_1_2 = some_str_1 * some_str_2
>>>  Error: TypeError: can't multiply sequence by non-int of type 'str'

11.5 Tuple

11.5.1 Overview

Tuple is an immutable collection of ordered, heterogeneous objects with below features

  • sequence type (collection of ordered objects)
    • this helps enable support for indexing and slicing
    • the position of data has meaning
    • useful in passing and operating on set of objects within code
  • heterogeneous: can contain any type of object
    • more efficient if items are homogeneous
  • immutable
    • modify in-place operations are not supported
      • a new object is created in memory on modification if elements are immutable
    • adding/deleting/changing elements is not provided by default
  • Python Tutorial: Gentle introduction to tuples
  • Python library reference: Detailed documentation on sequences

11.5.2 Specifications

11.5.2.1 Creation syntax

  • using commas
    • comma decides the tuple
    • parenthesis are just for code readability
  • using tuple constructor: tuple()
  • using unpacking (Python special syntax)
  • using comprehension (covered in Python special features)

11.5.2.2 Creation by use case

  • using elements
    • 0 item: () or tuple()
    • 1 item: i, or (i,)
      • (i) will give error, comma is needed
    • more than 1 item: i1, i2, i3 or (i1, i2, i3)
  • using elements from another iterable[s]:
    • using tuple constructor: tuple(iterable[s])
    • using unpacking
      • t = *l,, t = (*l,), t = (*s, *l)
      • t = (*l) will give error, comma is needed

11.5.3 Operations

Tuple has access to common operations on sequence types with no additional methods.

  • operations on sequence itself
    • length (len(s))
    • concatenate (s1 + s2), for same type sequences
    • repeat (s*n or n*s) where n is the number of repeats
    • comparisons, for same type sequences
  • operations on items in sequence
    • retrieve by position: index/slice (s[i[, j[, k=1]]])
    • min/max
    • check element’s
      • existence: e in s, e not in s
      • index: s.index(e)
      • count: s.count(e)

11.6 List

11.6.1 Overview

List is a mutable collection of ordered, heterogeneous objects with following features.

  • sequence type
    • this helps enable support for indexing and slicing
    • the position of data has meaning
    • useful in passing and operating on set of objects within code
  • heterogeneous: can contain any type of object
    • more efficient if items are homogeneous
  • mutable
    • adding/deleting/changing elements is provided by default
    • modify in-place operations are supported
  • Python Tutorial: Gentle introduction to list: part 1, part 2
  • Python library reference: Detailed documentation on sequences

11.6.2 Specifications

  • empty list: [] or list()
  • using elements: [i1, i2, ...], [i1]
  • using elements from iterable[s]:
    • using list constructor: list(iterable)
    • using unpacking
      • [*t], [*t,], [*s, *t, *l]
    • using comprehension (covered in Python special features)

11.6.3 Operations

There are 2 sets of operations a list supports.

  • common operations on sequence types
  • operations on mutable sequence types

11.6.3.1 Sequence operations

  • operations on sequence itself
    • length (len(s))
    • concatenate (s1 + s2), for same type sequences
    • repeat (s*n or n*s) where n is the number of repeats
    • comparisons, for same type sequences
  • operations on items in sequence
    • retrieve by position: index/slice (s[i[, j[, k=1]]])
    • min/max
    • check element’s
      • existence: e in s, e not in s
      • index: s.index(e)
      • count: s.count(e)

11.6.3.2 Mutable sequence operations

  • most of the operations are in-place
    • implies no new object creation on modification
  • operations on sequence itself
    • copy (shallow), extend, repeat, reverse
  • operations on items
    • delete, replace, append, clear all, remove, insert, pop
  • link

11.7 Range

Range is a special iterable to generate a sequence of integers with following characteristics.

  • immutable sequence type
  • cannot see all elements at a time
    • have to be unpacked into a list or tuple
  • syntax for creation
    • range(stop)
    • range(start, stop[, step])
      • start is included
      • stop is excluded
  • primarily used for loops which is discussed at Section 12.3.1

11.7.1 Examples

print(range(10))
>>>  range(0, 10)
print(list(range(10)))
>>>  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print([*range(11)])
>>>  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print([*range(1, 11)])
>>>  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print([*range(-1, -11)])
>>>  []
print([*range(-1, -11, -1)])
>>>  [-1, -2, -3, -4, -5, -6, -7, -8, -9, -10]

11.8 Dictionary

11.8.1 Overview

A dictionary is a mutable mapping type collection of heterogeneous objects mapped to keys that are hashable and unique objects.

  • in newer versions (>3.9) the order is guaranteed

  • collection of {key: value} pairs where

    • key can be any hashable object
      • strings, numeric data types can be used
      • immutable type which contain only immutable objects
      • tuples with immutable objects can be used
      • lists, dictionary cannot be used
    • value can be any Python object

A dictionary is useful when a collection of objects is needed with the option to do quick searches based on keys rather than index, unlike sequences.

Concept of hashable objects is introduced in DSA (Section 21.4.4).

11.8.2 Specifications

  • using key value pairs separated by commas

    • d = {"key1": value1, "key2": value2, ...}
  • using type constructor

    • d = dict([("key1", value1), ("key2", value2), ...])
    • d = dict(key1=value1, key2=value2)
  • create empty dictionary

    • d = {}
    • d = dict()
  • using comprehensions (covered in Python special features)

  • if a key is passed multiple times, final value exists

11.8.3 Operations

  • operations on dictionary itself
  • operations on keys and values

11.8.3.1 Operations on dictionary

  • length: len(d)
  • clear: d.clear()
  • shallow copy: d.copy()
  • update from another dictionary: d.update([other])

11.8.3.2 Operations on keys and values

  • check keys: key in d / key not in d
  • view all keys/values: d.keys() / d.values() / d.items()
  • get all keys/values as list of tuples: list(enumerate(d))
  • get all keys as list: list(d)
  • get all keys as list reversed: list(reversed(d))
  • get value, error if key not present: d[key]
  • set value, inserts key if key not present: d[key] = value
  • del key/value, deletes last entry and returns deleted key, value: d.popitem()
  • if key not present return default if defined else error
    • get value: d.get(key[, default])
    • del key/value and return deleted value: d.pop(key[, default])

11.9 Set

Set is a collection of unique objects with operations related to math sets available, e.g. union, intersection.

In other words, set is a special dictionary with keys only.

In Python specifically, set is an unordered collection of hashable objects. In newer versions (>3.9) the order is guaranteed.

Sets are commonly used for

  • membership testing: search by value
  • removing duplicates from a collection

11.9.1 Specifications

A set can be created using curly braces with the exception of empty set.

  • create a set with valid keys
    • curly braces
    • set constructor
some_set = {key1, key2, ...}
some_set = set(iterable)
  • empty set can be created using set() constructor
    • using {} creates an empty dictionary

11.10 Boolean data type

Boolean data type, True and False, is the fundamental unit for implementing boolean conditional expressions.

Boolean comparison operator are used to create elementary conditions. Combination operators allow for building larger conditions by combining multiple conditions.

Conditional control flow blocks, if and match, use conditions.

The idea is based on boolean math. It is essential in controlling the flow of the program based on state of one or more objects in the program.

  • bool in Python, based on usage, can refer to
    • data type
    • function
  • bool type is used to represent boolean values
  • bool type inherits from int type
  • bool data type can take value from 2 built-in constants, True and False
    • underlying int values are int(True) = 1 and int(False) = 0 respectively
  • can be stored in variables like other objects
    • useful in conditional blocks

Below are some examples for familiarity.

  • basics
bool(True), int(True), type(True)
>>>  (True, 1, <class 'bool'>)
bool(False), int(False), type(False)
>>>  (False, 0, <class 'bool'>)
  • storing in variables
    • it is useful to name boolean variables like is_<some check>
is_int = True
is_int, bool(is_int), int(is_int), type(is_int)
>>>  (True, True, 1, <class 'bool'>)

11.10.1 Boolean comparison operators

Boolean comparison operators are used for object comparisons and return True or False, if used with compatible object types.

They are mostly used to create boolean conditions which are used in conditional blocks, if and match..case.

When used with sequence types (string, tuple, list)

  • equality operators (==, !=) return True if all elements are equal (not equal) in content
  • inequality operators only test minimum and maximum as appropriate
  • membership testing check for existence of elements and is most useful

11.10.1.1 Options and syntax

Operation

Python Operator

Comments

basic value comparisons

==, !=, <, >, <=, >=

  • compares values
  • different types ok, but must be compatible

membership testing

in, not in

  • used with sequence types

object id comparison

is, is not

  • compares memory address
  • works on all type

11.10.1.2 Examples

11.10.1.2.1 Numeric
num_1 = 10; num_2 = 15; num_3 = 10.0
num_1 == num_2, num_1 < num_2, num_1 <= num_3
>>>  (False, True, True)
  • can be stored in variables
cnd = num_1 > num_3
print(f"{cnd = }, {type(cnd) = }")
>>>  cnd = False, type(cnd) = <class 'bool'>
11.10.1.2.2 Sequence type

Membership testing is more useful for sequence types. Below examples illustrate the usage.

11.10.1.2.2.1 Strings
  • test character in a string
some_string = "abcd"
some_chr_1 = "a"
some_chr_2 = "e"
some_chr_1 in some_string
>>>  True
some_chr_2 in some_string
>>>  False
  • test a small string in a longer string
some_long_str = "A reasonably long string"
some_short_str = "long"
some_short_str in some_long_str
>>>  True
11.10.1.2.2.2 Tuples and lists
some_list = [1, 2, 3, 4, (1, 2, 3)]
some_tuple = 1, 2, 3
num_1 = 3; num_2 = 5
num_1 in some_list
>>>  True
some_tuple in some_list
>>>  True
num_2 in some_tuple
>>>  False
  • store a boolean operation in a variable
cnd = some_tuple in some_list
>>>  cnd = True, type(cnd) = <class 'bool'>
11.10.1.2.3 None type

Since there is a single instance of None type object created in a Python session, object id comparison is useful. None is used to signify if a variable is defined but not assigned a value yet.

Below example illustrates the object id comparison for None type in isolation.

some_var = None
some_var is None
>>>  True

11.10.2 Boolean combination operators

Boolean combination operators use boolean math to provide means of combining multiple comparison operations and conditions to form larger conditions.

Operation

Python Operator

Comments

not

not x

  • inverts the bool value
  • if x is false, then True, else False

and

x and y

  • test if both conditions are True

or

x or y

  • test if any condition is True

()

(cnd1 and cnd2) or (cnd3)

  • used to group conditions
  • conditions inside () are evaluated first
  • basic examples of combining conditions
    • a == 10
    • a >= 5 and a <= 10
    • (a > 0 and a < 10) or (a >= 10 and a < 25)
  • order of precedence used for evaluation
    1. ()
    2. comparison operators have same priority (==, !=, <, >, <=, >=)
    3. not > and > or
  • chained comparisons
    • are automatically converted to paired and comparisons
    • example: a < b < c is same as a < b and b < c
    • this is specific to Python
  • for conditions with too many nested combinations it is recommended to use ()
    • best for code readability
    • avoid errors due to precedence order
  • can be stored in variables like other objects
    • and used later in control flow

and and or operators are based on logic gates in boolean math. Truth tables, given below, summarize results for logic gates. 0 and 1 are used instead of False and True for better readability.

x

y

x and y

x or y

1

1

1

1

1

0

0

1

0

1

0

1

0

0

0

0

There are some additional features which Python provides related to boolean data type and are discussed in the Python special features chapter (Section 15.1). They are left from this section to keep the complexity low at this stage.

11.11 Generic concepts

11.11.1 Iterable unpacking

Iterable unpacking is a special feature in newer versions of Python. Some features were introduced in Python version 2, more features added using PEP-3132: Extended Iterable Unpacking in version 3.

  • * unpacks remaining items
  • returns a list
  • advantages
    • better code readability
    • easier than using indexing
    • faster
  • gives error if there is a mismatch in number of items and variables

11.11.1.1 Examples

  • Unpack and assign elements of an iterable to variables
    • get first and remaining items of an iterable
some_list = [1, 2, 3, 4]; some_tuple = (1, 2, 3, 4)
first_item = some_list[0]
end_items = some_list[1:]
print(f'{first_item = }, {end_items = }')
>>>  first_item = 1, end_items = [2, 3, 4]
first_item, *end_items = some_list
print(f'{first_item = }, {end_items = }')
>>>  first_item = 1, end_items = [2, 3, 4]
  • Unpack and assign elements of an iterable to variables
    • get last and remaining items of an iterable
some_list = [1, 2, 3, 4]; some_tuple = (1, 2, 3, 4)
begin_items = some_tuple[0:-1]
last_item = some_tuple[-1]
print(f'{begin_items = }, {last_item = }')
>>>  begin_items = (1, 2, 3), last_item = 4
*begin_items, last_item = some_tuple
print(f'{begin_items = }, {last_item = }')
>>>  begin_items = [1, 2, 3], last_item = 4
  • Unpack and assign elements of an iterable to variables
    • get first two, last and remaining middle items of an iterable
some_list = [1, 2, 3, 4, 5, 6, 7]; some_tuple = (1, 2, 3, 4, 5, 6, 7)
first_item, second_item, *remaining_items, last_item = some_tuple
print(f'{first_item = }, {second_item = }')
>>>  first_item = 1, second_item = 2
print(f'{remaining_items = }, {last_item = }')
>>>  remaining_items = [3, 4, 5, 6], last_item = 7
  • Combine iterables into another
some_list_1 = [1, 2, 3]; some_tuple_1 = (4, 5);
some_tuple_2 = (*some_list_1, *some_tuple_1)
>>>  some_tuple_2=(1, 2, 3, 4, 5)
  • ** for mapping types - dictionary
some_dict_1 = {"key1": "value1", "key2": "value2.1"}
some_dict_2 = {"key2": "value2.2", "key3": "value3"}
some_dict_3 = {**some_dict_1, **some_dict_2}
>>>  some_dict_3={'key1': 'value1', 'key2': 'value2.2', 'key3': 'value3'}

11.11.2 Implications of mutability

  • Modify in-place: any modification to the object does not lead to creation of a new object

  • Copy on modify: create a new copy of object if modified, opposite of modify in-place

  • Mutable \(\implies\) modify in-place

    • Lists and dictionaries can be modified without creation of new object on RAM
  • Immutable \(\implies\) copy on modify

    • Strings and tuples create new objects on RAM if modified

Modify in-place means any modification to the object does not lead to creation of a new object. For e.g. strings and tuples create new objects on RAM if modified, whereas lists and dictionaries can be modified without creation of new object on RAM.

  • Implications
    • flexibility
    • efficiency (in terms of speed and memory)
  • Immutable objects (strings, tuples) are
    • efficient for constant data
    • less flexible
  • Mutable objects (lists, dictionaries) are
    • less efficient for constant data
    • flexible, support in-place modification
    • can be more efficient if data keeps changing over time

11.11.2.1 Changing elements

Since strings and tuples are immutable, elements cannot be assigned new values though indexing. This is unlike mutable types where this is allowed, e.g. lists.

some_string = "abcdee"
some_string[-1] = "f"
>>>  Error: TypeError: 'str' object does not support item assignment
some_tuple = (0, 1, 1)
some_tuple[2] = 2
>>>  Error: TypeError: 'tuple' object does not support item assignment
some_list = [0, 1, 5]
some_list[2] = 2
print(some_list)
>>>  [0, 1, 2]

Strings have internal methods that can change elements, but then they follow copy-on-modify.

some_string_1 = "abc"
some_string_2 = some_string_1.replace("a", "b")
print(f'{some_string_1=}, {some_string_2=}\n\
    {some_string_1 is some_string_2 = }')
>>>  some_string_1='abc', some_string_2='bbc'
>>>      some_string_1 is some_string_2 = False
some_string = "abc"; some_string_orig_id = hex(id(some_string))
>>>  some_string='abc', some_string_orig_id = '0x75bcb4720df0'
>>>  hex(id(some_string.replace("a", "b"))) = '0x75bca8b839f0'
>>>  some_string='abc', some_string_orig_id = '0x75bcb4720df0'
some_string = some_string.replace("a", "b")
>>>  some_string='bbc'
>>>  hex(id(some_string)) = '0x75bca8b934b0'
>>>  some_string_orig_id = '0x75bcb4720df0'

11.11.2.2 Propagation of changes

Mutable types like lists or dictionaries, when passed around through variable assignment, changes are propagated.

  • when you make changes to original list they propagate
    • this is required in many cases
    • but can also lead to a bug
    • be aware of the concept
  • to avoid default behavior when needed use
    • constructor list(iterable)
    • unpacking [*iterable]
    • loops
some_list_1 = [1, 2, "a", "b"]
some_list_2 = some_list_1
print(f'{some_list_1=}, {some_list_2=}\n{some_list_2 is some_list_1 = }')
>>>  some_list_1=[1, 2, 'a', 'b'], some_list_2=[1, 2, 'a', 'b']
>>>  some_list_2 is some_list_1 = True
some_list_1[2] = "abc"
print(f'{some_list_1=}, {some_list_2=}\n{some_list_2 is some_list_1 = }')
>>>  some_list_1=[1, 2, 'abc', 'b'], some_list_2=[1, 2, 'abc', 'b']
>>>  some_list_2 is some_list_1 = True
some_list_2[-1] = "xyz"
print(f'{some_list_1=}, {some_list_2=}\n{some_list_2 is some_list_1 = }')
>>>  some_list_1=[1, 2, 'abc', 'xyz'], some_list_2=[1, 2, 'abc', 'xyz']
>>>  some_list_2 is some_list_1 = True
  • using constructor or unpacking does not pass the object itself
some_list_1 = [1, 2, "a", "b"]
some_list_2 = list(some_list_1)
>>>  some_list_1=[1, 2, 'a', 'b'], some_list_2=[1, 2, 'a', 'b']
>>>  some_list_2 is some_list_1 = False
some_list_1[2] = "abc"
>>>  some_list_1=[1, 2, 'abc', 'b'], some_list_2=[1, 2, 'a', 'b']
>>>  some_list_2 is some_list_1 = False
  • using constructor or unpacking does not pass the object itself
some_list_1 = [1, 2, "a", "b"]
some_list_2 = [*some_list_1]
print(f'{some_list_1=}, {some_list_2=}\n{some_list_2 is some_list_1 = }')
>>>  some_list_1=[1, 2, 'a', 'b'], some_list_2=[1, 2, 'a', 'b']
>>>  some_list_2 is some_list_1 = False
some_list_1[2] = "abc"
print(f'{some_list_1=}, {some_list_2=}\n{some_list_2 is some_list_1 = }')
>>>  some_list_1=[1, 2, 'abc', 'b'], some_list_2=[1, 2, 'a', 'b']
>>>  some_list_2 is some_list_1 = False

11.11.2.3 Mutable in immutable

  • tuple is immutable in terms of its element objects

  • the contained object remains mutable if it is mutable

some_list = [1, 2, 3, 4, 5]
some_tuple = (some_list, "some other object")
print(f'{some_list=}\n{some_tuple=}\n{some_list is some_tuple[0] = }')
>>>  some_list=[1, 2, 3, 4, 5]
>>>  some_tuple=([1, 2, 3, 4, 5], 'some other object')
>>>  some_list is some_tuple[0] = True
some_list.pop()
>>>  5
print(f'{some_list=}\n{some_tuple=}\n{some_list is some_tuple[0] = }')
>>>  some_list=[1, 2, 3, 4]
>>>  some_tuple=([1, 2, 3, 4], 'some other object')
>>>  some_list is some_tuple[0] = True
  • if only contents are needed and propagation is to be avoided use unpacking or constructor
some_list = [1, 2, 3, 4, 5]
some_tuple_1 = *some_list,; some_tuple_2 = tuple(some_list)
print(f'{some_list=}\n{some_tuple_1=}\n{some_tuple_2=}')
>>>  some_list=[1, 2, 3, 4, 5]
>>>  some_tuple_1=(1, 2, 3, 4, 5)
>>>  some_tuple_2=(1, 2, 3, 4, 5)
print(f'{some_list is some_tuple_1 = }')
>>>  some_list is some_tuple_1 = False
print(f'{some_list is some_tuple_2 = }')
>>>  some_list is some_tuple_2 = False
some_list.pop()
>>>  5
print(f'{some_list=}\n{some_tuple_1=}\n{some_tuple_2=}')
>>>  some_list=[1, 2, 3, 4]
>>>  some_tuple_1=(1, 2, 3, 4, 5)
>>>  some_tuple_2=(1, 2, 3, 4, 5)
print(f'{some_list is some_tuple_1 = }')
>>>  some_list is some_tuple_1 = False
print(f'{some_list is some_tuple_2 = }')
>>>  some_list is some_tuple_2 = False

11.11.2.4 Shallow vs deep copy

Shallow copy creates a new object for the collection being copied but does not create new objects, if items in the collection are themselves collection. This can have side effects.

Regular copy method available in all collections (string, tuple, list, dictionary) makes a shallow copy.

This works fine if elements of the collection are immutable objects like numbers, strings or tuples, but if there are mutable types like list or dict then propagation will occur.

This might not be desirable at times, so a deep copy is needed which creates new objects for all elements of the original container going through nested structure of the collection recursively.

The standard library has copy module which has deepcopy function to achieve this.

Below example illustrates the point. It is recommended to do experiments to understand the concept.

some_list_1 is a list containing a list, tuple and a dictionary.

some_list_2 is a regular copy, so is different object from some_list_1, but elements point to the same underlying some_list, some_tuple and some_dict

some_list_3 is a regular copy from copy module, so behaves similar to some_list_2.

some_list_4 is a deep copy from copy module, so elements are different objects as well.

import copy
some_list = [1,2,3]; some_tuple = (4, 5); some_dict = {"six": 6, "seven": 7}
some_list_1 = [some_list, some_tuple, some_dict]
some_list_2 = some_list_1.copy()
some_list_3 = copy.copy(some_list_1)
some_list_4 = copy.deepcopy(some_list_1)
>>>  some_list_2 is some_list_1 = False
>>>  some_list_3 is some_list_1 = False
>>>  some_list_4 is some_list_1 = False
>>>  some_list_2[0] is some_list_1[0] = True
>>>  some_list_3[0] is some_list_1[0] = True
>>>  some_list_4[0] is some_list_1[0] = False