17 Modules & Packages
Python repl does not offer saving and repeating code. All the work is gone once the repl is closed. Jupyter notebooks are good for interactive tasks but not useful for large programs.
Scripts provide additional functionality.
- Save and run large programs from command line
- Save and re use definitions, like variables, functions and classes, related to a task
- e.g. standard library, external modules and packages
When requirements of a project increase, so does the size of the program. Size grows quickly from hundreds to thousands of line. Putting all the code in a single script is difficult to manage. That is why breaking up the code into files and cross referencing objects across files is required to manage the coding project efficiently.
Modules and packages help organize large coding projects. The knowledge of how modules and packages work in Python also help in using scripts more efficiently.
Python documentation on modules and packages
- Python tutorial: Modules & Packages
- Python reference:
__main__
- Python reference: Import system
- Namespace packages
17.1 Module
In Python a module can refer to several different things.
Module commonly is referred to a file type module
folder type modules are commonly referred to as packages explicitlymodule
is also atype
orclass
Module also refers to Python object once the module file or a package is loaded using
import
which is instance object oftype
module
Commonly module refers to a file with .py
extension.
There are 3 cases when module type object is created using import in Python
- file: a file or script containing Python code
- package: a folder containing sub-folders and scripts
- regular package: declared using
__init__.py
file - namespace package
- regular package: declared using
For example, if there is a file abc.py
with some code and import abc
is run on Python repl. abc
is a module object. abc.py
is also a file type module.
A regular package, commonly referred to as package, is also a module type object once loaded into memory by Python interpreter, the only difference is it is a folder containing __init__.py
, even if __init__.py
is empty.
File type and regular packages are covered here. Namespace packages are a newer feature and generally not required, hence are not covered. Information about namespace packages can be found at Python documentation and PEP 420.
17.2 Regular package
Any folder with a __init__.py
file, even if empty, is treated as a regular package.
Packages provide a way to organize and exchange code as they can contain packages (sub-packages) and file type modules.
Packages are generally used to provide some functionality using functions and class definitions, which can be used anywhere in the code.
Optionally, a regular package can also contain a __main__.py
file, this is run when the package is run as the top level environment. This is discussed in detail later in Section 17.4.1.
One of the key features of regular packages is that they allow cross referencing between modules in sub packages using relative imports.
17.3 Naming convention and rules
Convention: lower snake case
Rule: names containing special characters other than _
give error, essentially same rules as for variable names
17.3.1 Pep-8 Conventions
Object Type |
Convention |
Example |
---|---|---|
Packages |
|
|
Modules |
|
|
17.4 Usage
There are 2 main uses of modules and packages.
- Execution as top level environment
- Python
'__main__'
system manages this
- Python
- Organizing and cross referencing objects
- Python
import
system manages this
- Python
17.4.1 __main__
In Python, the system of __main__
manages the run time behavior of modules and packages, when they are run as top level environment. It helps in differentiating how a module or package is intended to be used.
To execute a Python module or package as top level environment from command line, there are 2 main ways
<path to python executable> <path to file>
- can take relative path
- for a regular package
- only
__main__.py
is executed if package name is used __main__.py
or__init__.py
can be given explicitly
- only
<path to python executable> -m <name without extension>
- cannot take relative paths
- command has to be executed from the directory containing the file or package
- if the file is in a directory below the shell directory then dot notation can be used
- e.g.
python3 -m sub_dir.file_1
- e.g.
- in case of a regular package,
__init.py
and__main__.py
both are executed
Since using the first approach, a Python file can be executed from any directory it is the preferred approach.
Note that, while using python3 -m <module or package name>
, if module name is abc.py
the above command needs only abc
without the .py
extension.
The recommended approach for executing one script from another is to keep the files in same folder and then use import
.
For example, file_1.py
has to execute file_2.py
. Place file_2.py
in same folder as file_1
then use import file_2
in file_1.py
. Whenever file_1.py
is executed it will execute file_2.py
. Note that if there is a main block present in file_2
, it will not be executed.
17.4.1.1 Modules
If a module is run as top level environment, the __name__
attribute of the module object is set to '__main__'
by the Python interpreter in background, this is used to isolate code which is run only when module is run directly
- the code is isolated using
if __name__ = '__main__'
block - the code is not run with
import
Another important thing to note is that function and class definitions are bound to the module they are declared in. So it does not matter where you call the function from, the global scope for a function is the module’s global scope in which it is declared.
Following coding exercise will demonstrate these fundamentals.
- create a Python file with following content and save with some name, e.g.
sample_mod.py
print("running sample module")
print("__name__ attribute of the current module = " + __name__)
if __name__ == "__main__":
print("running main block of sample module")
- from command line goto the folder containing
sample_mod.py
and execute the command
python3 -m sample_mod
- notice that print statement of the
if
block is executed because__name__
attribute of the module is set to__main__
- notice that print statement of the
- now start Python repl by entering the command
python3
in repl enterimport sample_mod
- notice that main block is not run because name of the loaded module is set to
sample_mod
- import system is discussed in later section
- notice that main block is not run because name of the loaded module is set to
17.4.1.2 Regular packages
Any folder containing __init__.py
is treated as a regular package by Python. Packages can additionally contain __main__.py
.
For packages, instead of a if __name__ = '__main__'
block, __main__.py
is used. If if __name__ = '__main__'
block is used in __init__.py
, it will give error when the package is called directly as the top level environment.
Executing with -m
switch
If __main__.py
is present in the regular package it is run along with the __init__.py
when the package is run as the top level environment using the -m
switch. If __main__
is not present, error occurs. Again this helps isolate code which needs to run only when the package is run directly as top level environment. When calling the same package using import
, __main__.py
is not run.
Executing without -m
switch
If a regular package is run without -m
switch, only __main__.py
is run else error is given.
Regular packages in general are supposed to be used with import for reusing object definitions rather than executing code. __main__.py
is for special cases, e.g. testing.
This is simple to test.
- In a folder in system create below folder structure
mkdir reg_pkg_1 reg_pkg_2
cd reg_pkg_1; touch __init__.py __main__.py
cd ../reg_pkg_2; touch __init__.py; cd ..
- write contents to the files
# reg_pkg_1.__init__.py
print("running regular package 1")
# reg_pkg_1.__main__.py
print("running main from regular package 1")
# reg_pkg_2.__init__.py
print("running regular package 2")
if __name__ == "__main__":
print("running main block of regular package 2")
- experiment
- run each package as top level environment with and without
-m
flag - import each package from repl
- run each package as top level environment with and without
- note that running
reg_pkg_2
as top level environment will give error because it is a regular package andif __name__ = '__main__'
does not work
17.4.2 import
system
Import system in Python helps manage using object definitions (variables, functions, classes) from other scripts.
Once imported, a module or package is a module type object with attributes. Python adds some special attributes to module type objects.
<module name>.__name__
: module’s name or__main__
if run directly<module name>.__file__
: absolute path to the module location<module name>.__package__
: is populated if it is a regular package or a part of one, else is empty string<module name>.__path__
: exists if module is a regular package or is part of one, else gives error
All this is supported using Python’s import system. Detailed documentation can be found at Python language reference: Import system.
17.4.2.1 Modules
Full module (file or package) can be imported using below statements.
import <module name>
import <module name> as <module alias>
17.4.2.2 Objects & sub-modules
While importing, only subset of a module or package can be imported. It can be specific objects or modules or sub-packages within a parent package.
from <module_name> import <object_name>
from <module name> import *
from <module name> import <object name>
from <module name> import <sub module name>
from <module name> import <object name> as <object alias>
from <module name> import <sub module name> as <module alias>
from <module name> import *
is not recommended when using other packages- it pollutes the namespace
- creates name clashes
<module name>.<obj name>
is better for code readability
It is provides information about source of the object used
17.4.2.3 Runtime behavior
Whenever a file or package type module is imported the following things happen
- Python interpreter checks if the module/package is already loaded into current namespace
- if the module/package is loaded
- nothing is done
- if the module/package is not loaded
- Python interpreter runs the code in module or
__init__.py
for a package- main code is not run
- all the exported objects are loaded
- Python interpreter runs the code in module or
- if the module/package is loaded
- When only a sub package is imported Python still loads the parent package
- therefore there is no efficiency gain if only a sub package is imported
17.5 Applications
A Python project might be intended to do one or both type of below tasks.
- Perform some task[s] by running a script as top level environment
- Create a package to define objects (functions, classes, data-objects, …) for re-use and sharing
Below are the use cases with recommendations.
- Adhoc and small projects: jupyter notebooks are sufficient most of the times.
- if code grows, definitions can be kept in modules and imported into notebook
- Adhoc and large projects: modules may be needed.
- If the size of code grows, it is useful to organize the code using regular packages.
- Recurring tasks: it makes sense to use modules which can be run from command line.
- If the size of code grows, it is useful to organize the code using regular packages.
17.5.1 Small projects
For small projects where the code can be well structured using 4-5 file type modules kept in a root directory.
As an example, root of the project can contain
main.py
to run the task from top level- it refers to function definitions and input objects from other modules as described below
- it is a convention to name this file as
app.py
ormain.py
category_1.py
,category_2.py
, … contain all functions for a category of tasksinputs.py
contains all inputs required for configuration
Folder structure will be flat in this case. Objects can be cross referenced without need of relative imports.
- project root
main.py
inputs.py
category_1.py
category_2.py
17.5.2 Large projects
For large projects it is advisable to use regular packages. This allows use of relative imports and using sub-packages, i.e. each sub directory is a regular package in itself with file type modules.
Packages allow categorizing and storing modules in folders and sub folders, then cross referencing objects as needed using relative imports.