2 Programming languages

2.1 Overview

Machines understand machine code which is in binary and is hard to work with. While writing instructions for the machine, it is easier to understand and work with natural language like english. Therefore, a mapping is needed to translate natural language like english to machine code.

The general idea of mapping natural language to machine code (binary) lead to evolution of programming languages.

There are multiple ways to solve the mapping problem based on the intended use case. Therefore there are numerous programming languages.

Typically a programming language refers to 2 distinct parts.

Specification: rules for writing code
- lexicon: allowed keywords, symbols, characters
- syntax: rules of arranging lexicons and creating new names (variables)
- semantics: rules to assign meaning to different combinations of lexicons and syntax
Implementation: A system application that implements the specification, like Compiler or Interpreter, to convert the natural language code into machine code and execute.
- Referred to as ‘language engine’ henceforth.

Some common type of language engines, applications to create this mapping or conversion are

Assembler
Compiler
Interpreter

Note that the implementation for a language, the engine, is itself a program in machine code stored on hardware. A programming language defines the mapping rules, specifications, which can be implemented using any of the above approaches, but usually a particular combination becomes more popular.

For example Python has many implementations. The official and most popular is CPython which is interpreter written mostly in C. There are other implementations as well, which have their own specifications which are similar but not the same.

Another example is C++, the specifications are maintained and revised by a central organization, The Standard C++ Foundation. Then there are various organizations that implement the compiler using the latest specifications. The GNU compiler collection (GCC), Microsoft Visual C++ (MSVC), CLang (LLVM) etc.

2.2 Types

Languages can be classified into below main categories in order of decreasing complexity and based on method of conversion.

low level languages
- assembly languages
high level languages
- compiled languages (compiler): C, C++, Go
- interpreted languages (interpreter): shell (bash, etc.)
  - hybrid: Python
    - JIT: Just in Time compilation, e.g. Java, JavaScript, C#

2.2.1 Assembly languages

Assembly languages have very limited set of specifications and are difficult to code in. They are machine specific and used to provide basic configuration to hardware, e.g. BIOS (basic input output setup) on computers. They are closest to the hardware and hence provide access to all components. They are very fast as they are directly translated to machine code using an assembler and specifications to be checked and converted are very limited.

Since there are very few mappings, every instruction has to be explicit, therefore the code written in assembly language is much longer compared to that in high level languages.

2.2.2 High level languages

High level languages is the next step in the mapping problem. They define their own set of specifications (lexicon, syntax and semantics) to provide a lot more features for writing programs, e.g. data types, control flow mechanisms, ways to define functions and objects, managing storing data on memory, managing input/output etc.

For example in C/C++ memory management and data types have to be dealt with by the programmer manually, but in Python a lot of these features are provided as default where programmer does not have to deal with them.

Since there are many pre-built instructions built in, the code written in high level languages tend to be shorter and easier to understand, compared to low level languages and machine language.

2.2.2.1 Compiled languages

Compiled languages solve the mapping problem using a compiler.

The compiler reads the whole program and converts it to machine code before execution. This process is also referred to as Ahead of time compilation (AOT).

The compiler is dependent on a few major factors

Hardware architecture (this is dependent on CPU primarily)
- Instruction set: x86, ARM64, …
- Processing size: 64 bit or 32 bit
Operating system (OS)

Therefore compiled languages like C/C++ have compilers available for most common architecture and OS combinations. The compiled machine code produced is different for different combinations.

Pros
- Fast
  - machine code is generated as standalone output
Cons
- Portability
  - have to be compiled for different OS + Hardware combinations
  - this is more of an inconvenience, as compilers for most popular combinations are available
- Difficulty
  - compared to hybrid interpreter languages which are higher level with more features towards ease of use
  - have higher development time, takes more time to code

How to write the compiler itself?

This seems to be a chicken and egg problem but it is not. Imagine there are no high level languages available and the first compiler for C language has to be written. The first compiler can be written in assembly language on a given system. This gives the machine code for C compiler. But now you can write compiler for any high level language in C and get the machine code for that compiler by compiling it with C compiler. From there on any compiler can be written in any compiled language and turned to machine code for needed systems.

2.2.2.2 Interpreted languages

The interpreter produces machine code at run time and can run code interactively. Interpreter takes care of system architecture and OS dependencies so can be run on any machine as long as the interpreter can be installed on that machine.

Interpreters are of different types depending on implementation for a language. In simplest form they execute one instruction at a time, like most shell programs do, e.g. bash or zsh.

2.2.2.2.1 Hybrid

Languages like Java, C# and CPython (Official and most used Python version) use a hybrid approach. The mapping is done in 2 main steps

compilation step: source code is checked and compiled into intermediate bytecode
interpreter step: the bytecode is interpreted into machine code and run at runtime

The run time behavior is still same as interpreter, in the sense that machine code is produced at run time rather than at compile time.

Pros
- Portability
  - same program can be run on different machines by just installing the language interpreter
- Ease of use
  - generally these languages are higher than compiled languages like C, C++
  - time to write code is reduced
Cons
- Slower
  - program is converted to machine code at run time
  - this depends on the use case, specially considering the increase in hardware capacity, for most of common tasks this can be ignored

2.2.2.2.1.1 JIT

Some implementations of some languages use just in time compilation (JIT) to optimize the interpreter.

The key idea is to compile certain pieces of code to optimized machine code at run time.

This is more of an optimization technique for increasing speed, but this is still slower compared to compiled languages.

Examples of language implementations that use JIT:

Java: Java Virtual Machine (JVM)
C#: Common language runtime (CLR)

2.3 Popular languages

Use Case	Python	R	JavaScript	C	C++	Java
Automation	x	x		x	x
Data analysis	x	x		x	x
Scientific Computing	x	x		x	x
Web dev	x		x	x	x	x
System App dev	x		x	x	x	x
Mobile App dev	x		x	x	x	x
Cloud dev	x		x	x	x	x
Network dev	x			x	x	x

2.3.1 Desktop & Mobile applications

Major operating systems create their own language and frameworks to develop applications to run on the os. This is changing now to target cross platform development, i.e. application built using a framework + language combination can be built for multiple devices and OS, just like compilers.

Frameworks are an extra layer of functionalities and ways of binding them for a specific hardware architecture + os combination. For example, for creating graphical user interfaces building blocks are prebuilt and can be glued into the main application program without having to write them from scratch.

The OS specific languages and platforms are now shifting to being cross platform.

Language	Device	OS	Framework
C#, Visual Basic, F#	Desktop	Windows (Microsoft)	.NET, XAMARIN
Swift	Desktop, Mobile	MacOS, IOS (Apple)	SwiftUI
C++	Cross-platform	Cross-platform	QT
Dart	Cross-platform	Cross-platform	Flutter
Java	Cross-platform	Cross-platform	Multiple frameworks
Kotlin	Cross-platform	Cross-platform	Multiple frameworks

2.3.2 Websites (web dev)

Web technologies have become popular with the creation and development of browsers and web networks.

Front end (GUI) is based on HTML, CSS and JavaScript
Backend functionalities: like databases, analytics can be developed in language of choice

In practice there are many frameworks available for development which provide standard templates of pre built code for different functionalities.

Name	Description
HTML	Markup language to structure user interface structure and text
CSS	Responsible for the user interface styling
JavaScript	Responsible for the functionality behind web pages, e.g. click button actions, form submissions, …
TypeScript	Stricter version of Javascript which is static typed

How do browsers run code?

Browsers are primarily running JavaScript using their own set of mapping engines.

Earlier the browser JavaScript engines were using interpreter approach but now have shifted to JIT approach like Java.

Google Chrome: V8 engine
Mozilla Firefox: SpiderMonkey
Apple Safari: JavaScriptCore
Microsoft Edge: Chakra

2.3.3 General purpose

2.3.3.1 C and C++

C and C++ are very popular compiled languages for writing high performance code. They provide access to most of the functionality hardware has to offer. Most of the critical applications like operating systems, compilers, virtual machines are written in C/C++.

These have the steepest learning curve as the programmer has to know the how the hardware works to use these languages and most of the interactions like managing data stored on RAM, directing CPU for parallel processing have to be coded by the programmer.

C is the older among the two with lesser features available. Everything has to be built from scratch or using some one else’s code.

C++ is a newer version of C with OOP, few changes in syntax and a lot more features. Also C++ is under active development.

Both are compiled languages and compilers are available from various sources, most common being GNU GCC, which is a collection of compilers for different hardware architecture and OS combinations.

Although these are not used directly in all use cases but much of the functionality under the hood uses these for performance reasons. For example, much of the libraries for scientific computing and data analysis in languages like Python and R are developed using C/C++ and a wrapper is implemented in Python or R. All major operating systems are written using C/C++ in part or in full.

2.3.3.2 Python

Python is a general purpose programming language. The official and most common version of Python that is used and covered in this course is implemented in C, and therefore also referred to as CPython.

Python is very popular language with applicability in most of the application areas. Also it has a lot of pre built code available for majority of the regular tasks.

2.4 Programming paradigms

A programming paradigm defines the approach or style of solving problems through programming.

Programming paradigms are grouped into 2 main categories.

Declarative: focusses on what to solve
- Logic programming
- Functional programming
- Data driven programming
Imperative: focusses on how to solve the problem
- Procedural programming
- Object oriented programming
- Structured programming

A language, through design of specifications and implementation, can support one or more of paradigms.

Almost all popular programming languages support multiple paradigms and hence the term multi-paradigm is used. e.g Python, C, C++, Java, Javascript etc.

For example, in Python, same problem can be solved using functional programming or OOP. Python provides all the options.

2.5 General structure

This is an attempt to structure all knowledge related to programming, including practical aspects.

Language specifications
- Building blocks: specifications for elements and blocks of code
- Architecture: specifications for writing programs
  - means and specifications to combine elements and blocks to build programs
  - management of execution of blocks (under the hood)
Design: knowledge of how to write good programs
Tools: needed to write, test, debug and run programs

Language specification

Building blocks

Basic specifications
- Lexicons
- Variables
- Comments
Data types and operations
Control flow blocks
Functions
OOP
Special features

Architecture

Environment = Namespace + Scope
Scripts and packages
External packages
Language engine
Debugging

Design

Components
- Design patterns
  - Map reduce
- DSA
- Regular expressions
Frameworks
Workflow

Tools

Command line interface
Version control system
Language installation
- Engine
- External packages
- Virtual environments
Editor

Almost all programming languages can be fitted to this generic structure/framework, therefore getting started with any language will become easier if you keep this structure in mind. This book is structured to follow this structure as well.

The wiki link provides a comparison of different languages where you can use this structure to look for relevant basic syntax.

The implementation and implications will differ for some pieces from language to language which you learn by experience in coding in it as use case arise. For example, in Python you do not need to worry about hardware management as it is taken care by the interpreter, but in C/C++ you have to manage them yourself. The scope rules are different from language to language.

2.6 Underlying Concepts

The 3 basic concepts (means of abstraction, encapsulation and combination) have been foundational in development of computer science from hardware to software. Once you start seeing this pattern in every aspect of definition of a programming language, you will understand and implement programming better.

Underlying concepts

Means of abstraction
- building blueprints of data, operations, functionality, …
Means of encapsulation
- hiding details of implementation
Means of combination
- using basic data structures to create compound data
- combine basic functions to build bigger functionality

2.6.1 Means of Abstraction

Abstraction in general means to isolate ideas or concepts from actual physical reality. In other words it is a process of generalizing physical phenomenon without worrying about the details around it. Below are examples to understand application of abstraction in programming.

Programming language is itself built on this idea where abstract specification is isolated from actual implementation. For example C++ language specification is provided and maintained by a group of people. The actual implementation of the compiler can be done by anyone using those specifications. Therefore there are many C++ compilers available like GCC, LLVM etc.

Data types have evolved by isolating the specification and implementation. Interfaces, or abstract data types (ADT), define general desired properties a certain data type should possess. Data structure with algorithms provide a concrete implementation for an interface. For example, dictionary is an ADT while Python dict is a data structure with algorithms for different operations. The performance of a dictionary in CPython might differ from a dictionary in C++ or even JPython (Implementation of Python VM using Java).

Functional programming uses this idea in many different ways. Example 1, map, filter and reduce are general concepts related to a collection of items, the actual output depends on the function supplied while using the concept. Example 2, factory functions provide a blueprint of functionality, the arguments supplied decide the actual function created.

Object oriented programming (OOP) is built around this idea where class is an abstract blueprint of an object and instance is the actual object which depends on the data it holds.

2.6.2 Means of Encapsulation

Encapsulation in general means hiding the details of implementation.

Programming language: hides the details of mapping problem, which is the actual implementation of the engine, and lets the programmer create new solutions using natural language, rules and syntax without worrying about details of implementation of compiler, hardware management etc.

Data types: Once a data type is implemented you can create multiple instances without worrying about the details of implementation. For e.g. most languages provide way of defining numbers and strings by directly entering them without worrying about implementation details of how and where to store the data.

Functional programming: Once a function is created you can use them as and when required without worrying about the implementation details during usage.

Object oriented programming (OOP): Classes provide a way to define blue prints of objects with certain attributes. After that multiple instances of the data type can be created and operated upon without worrying about the implementation details during usage.

2.6.3 Means of Combination

Combining in general means joining. In programming it means the same, ways to combine simple parts to make a more complex part and make it simple using encapsulation.

Programming language is built around the same concept, smaller pieces are the elements of specification like lexicons and syntax, then semantics and architecture defines ways to combine them to make more complex programs.

Data types: Data structures like tuple, list, dictionaries etc. provide means of combining different data types to create arbitrarily larger and more complex data.

Functional programming: Using the rules of scopes and namespaces functions can be joined and nested in many interesting ways to create more complex functions.

Object oriented programming (OOP): Class itself allows combining data and operations to create arbitrarily large and complex data types as needed. Further more rules of inheritance provide ways of relating class definitions to create and structure even more complex data types.