01 Introduction to Python

dsfds

Why Python?

Python is the programming language of choice for many scientists to a large degree because it offers a great deal of power to analyze and model scientific data with relatively little overhead in terms of learning, installation or development time. It is a language you can pick up in a weekend, and use for the rest of one’s life.

The Python Tutorial is a great place to start getting a feel for the language.

Some othe tutorials made as notebooks:

IPython notebooks is an easy way both to get important work done in your everyday job, as well as to communicate what you’ve done, how you’ve done it, and why it matters to your coworkers.

What You Need to Install

  • Python version 3.10+ (recommended, corresponds Ubuntu 22.04);
  • Numpy, the core numerical extensions for linear algebra and multidimensional arrays;
  • Scipy, additional libraries for scientific programming;
  • Matplotlib, excellent plotting and graphing libraries;
  • Jupyter, web application that allows you to create documents with live code and explanatory text;
  • Seaborn, visualization library, which provides a high-level interface for drawing attractive statistical graphics;
  • Scikit-learn, machine learning library.

How to start

Pip (default)

The simplest way to install the most of common platform-dependent libraries like numpy or scikit-learn or pure Python libraries like Jupyter.

!pip install numpy scipy matplotlib
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.26.0)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (1.11.3)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.8.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (2.8.2)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.1)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.1.1)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (23.2)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.43.0)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (10.0.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Jupyter

The Jupyter Notebook is a web application that allows you to create and share documents that contain

  • live code (repl),
  • equations (powered by \(\LaTeX\) via MathJax),
  • visualizations and explanatory text.

Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

See more details here https://nbviewer.jupyter.org/github/ipython/ipython/blob/4.0.x/examples/IPython%20Kernel/Index.ipynb

I. Python Overview

Using Python as a Calculator

Many of the things I used to use a calculator for, I now use Python for:

2 + 3
5
(50 - 5 * 6) / 4
5.0
7 / 3
2.3333333333333335
a = 1 + 1j
b = 1 - 1j
a * b
(2+0j)

Exact fractional numbers (e.g. for accountment).

from decimal import Decimal
a = Decimal('2.03')
b = Decimal('5')
c = a / b
c
Decimal('0.406')
c.as_integer_ratio()
(203, 500)

We’ve seen, however briefly, two different data types: * integers, also known as whole numbers to the non-programming world * floating point numbers, also known (incorrectly) as decimal numbers to the rest of the world.

We now look at import statement. * Python has a huge number of libraries included with the distribution. * Most of these variables and functions are not accessible from a normal Python interactive session. * Instead, you have to import the name.

For example, there is a math module containing many useful functions. To access, say, the square root function, you can either first

from math import sqrt

and then

sqrt(81)
9.0

or you can simply import the math library itself

import math
math.sqrt(81)
9.0

You can define variables using the equals (=) sign:

width = 20
length = 30
area = length * width
print(area)
600

If you try to access a variable that you haven’t yet defined, you get an error:

volume
NameError: name 'volume' is not defined

and you need to define it:

depth = 10
volume = area * depth
volume
6000
  • You can name a variable almost anything you want.

  • It needs to start with an alphabetical character or “_”, can contain alphanumeric characters plus underscores (“_”).

  • Certain words, however, are reserved for the language:

    and, as, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while, with, yield

def печать(*аргументы, **ещё_аргументы):
    print(*аргументы, **ещё_аргументы)
печать('Hello, world!')
Hello, world!

Trying to define a variable using one of these will result in a syntax error:

return = 0
SyntaxError: invalid syntax (3966660672.py, line 1)

Strings

Strings are lists of printable characters, and can be defined using either single quotes

'Hello, World!'
'Hello, World!'

or double quotes

"Hello, World!"
'Hello, World!'

But not both at the same time, unless you want one of the symbols to be part of the string.

"He's a Rebel"
"He's a Rebel"
'She asked, "How are you today?"'
'She asked, "How are you today?"'

Just like the other two data objects we’re familiar with (ints and floats), you can assign a string to a variable

greeting = "Hello, World!"

The print statement is often used for printing character strings:

print(greeting)
Hello, World!

But it can also print data types other than strings:

print("The area is", area)
The area is 600

In the above, the number 600 (stored in the variable “area”) is converted into a string before being printed out.

You can use the + operator to concatenate strings together:

statement = "Hello," + " " + "World!" * 2
print(statement)
Hello, World!World!

Don’t forget the space between the strings, if you want one there.

statement = "Hello, " + "World!"
print(statement)
Hello, World!

You can use + to concatenate multiple strings in a single statement:

print("This " + "is " + "a " + "longer " + "statement.")
This is a longer statement.

If you have a lot of words to concatenate together, there are other, more efficient ways to do this. But this is fine for linking a few strings together.

Recently f-string litearals were introduced in Python.

f'pi = '
'pi = 3.14'
from math import pi
pi
3.141592653589793
from math import pi
f'pi = {pi}'
'pi = 3.141592653589793'

Lists

Very often in a programming language, one wants to keep a group of similar items together. Python does this using a data type called lists.

days_of_the_week = ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]

You can access members of the list using the index of that item:

days_of_the_week[2]
'Tuesday'
  • Python lists, like C, but unlike Fortran, use 0 as the index of the first element of a list. * Thus, in this example, the 0 element is “Sunday”, 1 is “Monday”, and so on.
  • If you need to access the nth element from the end of the list, you can use a negative index.
  • For example, the -1 element of a list is the last element:
days_of_the_week[-2]
'Friday'

You can add additional items to the list using the .append() command:

languages = ["Fortran","C","C++"]
languages.append("Python")
print(languages)
['Fortran', 'C', 'C++', 'Python']

The range() command is a convenient way to make sequences of numbers:

range(10)
range(0, 10)
  • Please, note than unlike Python 2, in Python 3 the range() command creates interator over the sequence.
  • To create list you can do:
list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  • Note that range(n) starts at 0 and gives the sequential list of integers less than n.
  • If you want to start at a different number, use range(start, stop).
list(range(2, 8))
[2, 3, 4, 5, 6, 7]

The lists created above with range have a step of 1 between elements. You can also give a fixed step size via a third command:

evens = range(0, 20, 2)
list(evens)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
evens[3]
6

Lists do not have to hold the same data type. For example,

["Today", 7, 99.3, ""]
['Today', 7, 99.3, '']
  • However, it’s good (but not essential) to use lists for similar objects that are somehow logically connected.
  • If you want to group different data types together into a composite data object, it’s best to use tuples, which we will learn about below.

You can find out how long a list is using the len() command:

len(evens)
10
sum(evens)
90
help(len)
Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.
len?
Signature: len(obj, /)
Docstring: Return the number of items in a container.
Type:      builtin_function_or_method
len(evens)
10

Iteration, Indentation, and Blocks

  • One of the most useful things you can do with lists is to iterate through them, i.e. to go through each element one at a time.
  • To do this in Python, we use the for statement:
days_of_the_week
['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
for day in days_of_the_week:
    print(day)
print('***')
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
***
  • Python uses a colon (“:”), followed by indentation level to define code blocks. * Everything at a higher level of indentation is taken to be in the same block.
  • In the above example the block was only a single line, but we could have had longer blocks as well:
for day in days_of_the_week:
    statement = "Today is " + day + days_of_the_week[0]
    print(statement)
Today is SundaySunday
Today is MondaySunday
Today is TuesdaySunday
Today is WednesdaySunday
Today is ThursdaySunday
Today is FridaySunday
Today is SaturdaySunday

The range() command is particularly useful with the for statement to execute loops of a specified length:

range(10)
range(0, 10)
for i in range(10):
    print("The square of ", i," is ", i * i)
The square of  0  is  0
The square of  1  is  1
The square of  2  is  4
The square of  3  is  9
The square of  4  is  16
The square of  5  is  25
The square of  6  is  36
The square of  7  is  49
The square of  8  is  64
The square of  9  is  81

Slicing

days_of_the_week
['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
days_of_the_week[1:6]
['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
  • Lists and strings have something in common that you might not suspect: they can both be treated as sequences.
  • You already know that you can iterate through the elements of a list.
  • You can also iterate through the letters in a string:
for letter in "Sunday":
    print(letter)
S
u
n
d
a
y

This is only occasionally useful. Slightly more useful is the slicing operation, which you can also use on any sequence. We already know that we can use indexing to get the first element of a list:

days_of_the_week[6]

If we want the list containing the first two elements of a list, we can do this via

days_of_the_week[0:4]
['Sunday', 'Monday', 'Tuesday', 'Wednesday']

or simply

days_of_the_week[:4]
['Sunday', 'Monday', 'Tuesday', 'Wednesday']

If we want the last items of the list, we can do this with negative slicing:

days_of_the_week[1:4:2]
['Monday', 'Wednesday']

which is somewhat logically consistent with negative indices accessing the last elements of the list.

You can do:

workdays = days_of_the_week[1:6]
print(workdays)
['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']

Since strings are sequences, you can also do this to them:

day = "Sunday"
abbreviation = day[:3]
print(abbreviation)
Sun

Booleans and Truth Testing

  • We invariably need some concept of conditions in programming to control branching behavior, to allow a program to react differently to different situations.
  • If it’s Monday, I’ll go to work, but if it’s Sunday, I’ll sleep in.
  • To do this in Python, we use a combination of boolean variables, which evaluate to either True or False,

and

  • if statements, that control branching based on boolean values.

For example:

==, !=, <=, <, ...
day = 'Monday'

if day == "Sunday":
    print("Sleep in")
elif day == 'Saturday':
    print('Go to gym')

Let’s take the snippet apart to see what happened. First, note the statement

day == "Sunday"
True
if day == "Sunday":
    print("Sleep in")
else:
    print("Go to work")
Sleep in
day
'Sunday'
match day:
    case 'Sunday':
        print("Sleep in")
    case 'Saturday':
        print('Go to gym')
    case _:
        print('Go to work')
Sleep in

You can compare any data types in Python:

1 == 2
False
50 == 2 * 25
True
3 < 3.14159
True
1 == 1.0
True

You can compare any data types in Python:

1 == 1.0
True
0 != 0
False
1 <= 2
True
1 >= 1
True

We can do boolean tests on lists as well:

[1, 2, 3] == [1, 2, 4]
False
[1, 2, 3] < [1, 2, 4]
True

If statements can have elif parts (“else if”), in addition to if/else parts. For example:

if day == "Sunday":
    print("Sleep in")
elif day == "Saturday":
    print("Do cycling")
else:
    print("Go to work")
Sleep in

Of course we can combine if statements with for loops, to make a snippet that is almost interesting:

for day in days_of_the_week:
    statement = "Today is " + day
    print(statement)
    if day == "Sunday":
        print("   Sleep in")
    elif day == "Saturday":
        print("   Do cycling")
    else:
        print("   Go to work")
Today is Sunday
   Sleep in
Today is Monday
   Go to work
Today is Tuesday
   Go to work
Today is Wednesday
   Go to work
Today is Thursday
   Go to work
Today is Friday
   Go to work
Today is Saturday
   Do cycling

Code Example: The Fibonacci Sequence

  • The Fibonacci sequence is a sequence in math that starts with 0 and 1, and then each successive entry is the sum of the previous two.

  • Thus, the sequence goes 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,…

  • A very common exercise in programming books is to compute the Fibonacci sequence up to some number n.

  • First I’ll show the code, then I’ll discuss what it is doing.

n = 10
sequence = [0, 1]
for i in range(2, n): # This is going to be a problem if we ever set n <= 2!
    sequence.append(sequence[i - 1] + sequence[i - 2])
print(sequence)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Let’s go through this line by line. First, we define the variable n, and set it to the integer 20. n is the length of the sequence we’re going to form, and should probably have a better variable name. We then create a variable called sequence, and initialize it to the list with the integers 0 and 1 in it, the first two elements of the Fibonacci sequence. We have to create these elements “by hand”, since the iterative part of the sequence requires two previous elements. We then have a for loop over the list of integers from 2 (the next element of the list) to n (the length of the sequence). After the colon, we see a hash tag “#”, and then a comment that if we had set n to some number less than 2 we would have a problem. Comments in Python start with #, and are good ways to make notes to yourself or to a user of your code explaining why you did what you did. Better than the comment here would be to test to make sure the value of n is valid, and to complain if it isn’t; we’ll try this later. In the body of the loop, we append to the list an integer equal to the sum of the two previous elements of the list. After exiting the loop (ending the indentation) we then print out the whole list. That’s it!

Functions

  • We might want to use the Fibonacci snippet with different sequence lengths.
  • We could cut an paste the code into another cell, changing the value of n, but it’s easier and more useful to make a function out of the code.
  • We do this with the def statement in Python:
def fibonacci(n):
    sequence = [0, 1]
    for i in range(2, n): # This is going to be a problem if we ever set n <= 2!
        sequence.append(sequence[i - 1] + sequence[i - 2])
    return sequence
n = 10
sequence = [0, 1]
for i in range(2, n): # This is going to be a problem if we ever set n <= 2!
    sequence.append(sequence[i - 1] + sequence[i - 2])
print(sequence)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
def fibonacci(sequence_length):
    "Return the Fibonacci sequence of length *sequence_length*"
    sequence = [0, 1]
    if sequence_length < 1:
        print("Fibonacci sequence only defined for length 1 or greater")
        return
    if 0 < sequence_length < 3:
        return sequence[:sequence_length]
    for i in range(2,sequence_length): 
        sequence.append(sequence[i-1]+sequence[i-2])
    return sequence

We can now call fibonacci() for different sequence_lengths:

fibonacci(2)
fibonacci(12)

We’ve introduced a several new features here. First, note that the function itself is defined as a code block (a colon followed by an indented block). This is the standard way that Python delimits things. Next, note that the first line of the function is a single string. This is called a docstring, and is a special kind of comment that is often available to people using the function through the python command line:

help(fibonacci)

If you define a docstring for all of your functions, it makes it easier for other people to use them, since they can get help on the arguments and return values of the function. Next, note that rather than putting a comment in about what input values lead to errors, we have some testing of these values, followed by a warning if the value is invalid, and some conditional code to handle special cases.

Recursion and Factorials

Functions can also call themselves, something that is often called recursion. We’re going to experiment with recursion by computing the factorial function. The factorial is defined for a positive integer n as

\[ n! = n(n-1)(n-2)\cdots 1 \]

First, note that we don’t need to write a function at all, since this is a function built into the standard math library. Let’s use the help function to find out about it:

from math import factorial
help(factorial)
Help on built-in function factorial in module math:

factorial(x, /)
    Find x!.
    
    Raise a ValueError if x is negative or non-integral.

This is clearly what we want.

factorial(20)
2432902008176640000

However, if we did want to write a function ourselves, we could do recursively by noting that

\[ n! = n(n-1)!\]

The program then looks something like:

def fact(n):
    if n <= 0:
        return 1
    return n * fact(n - 1)
fact(20)
2432902008176640000

Recursion can be very elegant, and can lead to very simple programs.

Two More Data Structures: Tuples and Dictionaries

Before we end the Python overview, I wanted to touch on two more data structures that are very useful (and thus very common) in Python programs.

A tuple is a sequence object like a list or a string. It’s constructed by grouping a sequence of objects together with commas, either without brackets, or with parentheses:

t = (1,2,'hi',9.0)
t
(1, 2, 'hi', 9.0)
t = list(t)
t
[1, 2, 'hi', 9.0]
t[0] = 0

Tuples are like lists, in that you can access the elements using indices:

t[1]

However, tuples are immutable, you can’t append to them or change the elements of them:

t.append(7)
t[1] = 77

Dictionaries are an object called “mappings” or “associative arrays” in other languages. Whereas a list associates an integer index with a set of objects:

mylist = [1, 2, 9, 21]
mylist[0]
1
  • The index in a dictionary is called the key, and the corresponding dictionary entry is the value.
  • A dictionary can use (almost) anything as the key.

Whereas lists are formed with square brackets [], dictionaries use curly brackets {}:

ages = {"Rick": 46, "Bob": 86, "Fred": 21}
# print("Rick's age is", ages["Rick"])
ages['Bob'] 
10
ages['Alice'] = 23
ages
{'Rick': 46, 'Bob': 10, 'Fred': 21, 'Alice': 23}

II. Numpy and Scipy

  • Numpy contains core routines for doing fast vector, matrix, and linear algebra-type operations in Python.
  • Scipy contains additional routines for optimization, special functions, and so on. Both contain modules written in C and Fortran so that they’re as fast as possible.
  • Together, they give Python roughly the same capability that the Matlab program offers.

Making vectors and matrices

Fundamental to both Numpy and Scipy is the ability to work with vectors and matrices. You can create vectors from lists using the array command:

import numpy as np
np.ndarray
numpy.ndarray
py_list = [1, 2, 3, 4, 5, 6]
py_list
[1, 2, 3, 4, 5, 6]
arr = np.array(py_list, dtype=np.float32)
arr.dtype
dtype('float32')
arr
array([1., 2., 3., 4., 5., 6.], dtype=float32)
scaler = 42
arr[:2]
array([1., 2.], dtype=float32)
np.sum(arr)
21.0
arr.sum()
21.0

To build matrices, you can either use the array command with lists of lists:

a = [[0, 1], [1, 0]]
a
[[0, 1], [1, 0]]
b = np.array(a)
b.ndim
2
b.shape
(2, 2)

You can also form empty (zero) matrices of arbitrary shape (including vectors, which Numpy treats as vectors with one row), using the zeros command:

np.zeros((10, 4), dtype=np.complex64)
array([[0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]], dtype=complex64)

The first argument is a tuple containing the shape of the matrix, and the second is the data type argument, which follows the same conventions as in the array command. Thus, you can make row vectors:

np.zeros(3)
array([0., 0., 0.])
np.zeros((1, 3))
array([[0., 0., 0.]])

or column vectors:

a = np.zeros((3, 4), dtype=np.int32)
np.ones_like(a)
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int32)
np.

There’s also an identity command that behaves as you’d expect:

np.eye(5)
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

as well as a ones command

np.ones(4)
a = np.ones(4)
b = np.ones_like(a) * 2
a
array([1., 1., 1., 1.])
b
array([2., 2., 2., 2.])
a @ b
8.0
a = np.ones((3, 4))
b = np.ones((4, 2))
a.shape
(3, 4)
b.shape
(4, 2)
c = a @ b
c.shape
(3, 2)
b
array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])
b.T
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])
a @ b.T
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 4)
import numpy.linalg
range
range
xs = np.arange(10, dtype=np.float32)
xs
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float32)
xs.dtype
dtype('float32')
xs.shape
(10,)
np.nan 
nan
np.inf
inf
np.linalg.norm(xs, ord=np.inf)
9.0
np.linalg.eigvals(np.ones((3, 3)))
array([ 3.00000000e+00,  6.16297582e-33, -7.50963641e-17])
arr = np.arange(12)
arr = arr.astype(np.float32)
arr
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.],
      dtype=float32)