11  NumPy

A Python library for scientific computing. It features multidimensional array objects along with an assortment of functions optimized for fast computation. To use this library we first need to import it as shown below. After that we’ll have access to all the functions within the NumPy library. Let’s create a numpy array.

import numpy as np
a1 = np.array([1,2,3,4,5,6])
print(a1)
print(type(a1))
[1 2 3 4 5 6]
<class 'numpy.ndarray'>

Although the array looks like a list but it has very different properties and functions associated with it as compared to the python list. An important difference between a list and an ndarray is that the former can have elements of muliple data types while the later can only contain elements of same data type. There are multiple built-in function to create and manipulate arrays for numerical analysis. Let’s have a better understanding of the differences between lists and ndarrays by looking at some the functions associated with arrays. For instance, there are no in-built functions to change the shape of a list or to transpose a list whereas these operations can be done seemlessly with ndarrays. The shape function returns the dimensions of an ndarray as a tuple and the reshape function changes the shape of an ndarray given a tuple.

print("The shape of a1 array is", a1.shape)
m1 = a1.reshape(2,3)
print(m1)
print("The shape of m1 array is", m1.shape)
The shape of a1 array is (6,)
[[1 2 3]
 [4 5 6]]
The shape of m1 array is (2, 3)

The transpose and ravel functions can be used to transpose and flatten multidimensional arrays. For tranposition, we can also use .T attribute of a numpy array.

print(m1)
m1_transposed = m1.T
print(m1_transposed)
[[1 2 3]
 [4 5 6]]
[[1 4]
 [2 5]
 [3 6]]
print(m1)
m1_flattened = m1.ravel()
print(m1_flattened)
[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]

There are various functions available in NumPy to create arrays e.g. zeros, ones, empty, random can be used to create an ndarray having all the values as zero, one, no value, or random values, respectively. These three functions take shape of the array as an argument and data type can be optionally specified. arange is another function that can be used to initialize an array with specific values. This function is similar to the range function with a difference that it return an array while the range function returns a list.

array_of_zeros = np.zeros((3,3))
print(array_of_zeros)
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
array1 = np.arange(0,10)
print(array1)
array2 = np.arange(0,10,2)
print(array2)
[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]

linspace is another useful function to generate specified number of equally spaced values within a given range. It takes three numbers as argument, the first two numbers specify the range (both number included) and the third number specifies the number of values to return.

array3 = np.linspace(0,3,5) 
print(array3)
array4 = np.linspace(1,10,4)
print(array4)
[0.   0.75 1.5  2.25 3.  ]
[ 1.  4.  7. 10.]

11.1 Reading data from file

Numpy has genfromtxt function that comes handy for reading data from text files. The example below show read a csv file using this function.

# %load test1.csv
1, 1, 1
2, 4, 8
3, 9, 27
4, 16, 64
5, 25, 125
inp_data = np.genfromtxt("test1.csv", delimiter=",")
print(inp_data)
print(inp_data.shape)
[[  1.   1.   1.]
 [  2.   4.   8.]
 [  3.   9.  27.]
 [  4.  16.  64.]
 [  5.  25. 125.]]
(5, 3)
sub_matrix = inp_data[1:4]
print(sub_matrix)
[[ 2.  4.  8.]
 [ 3.  9. 27.]
 [ 4. 16. 64.]]

11.2 Writing data to file

Numpy has savetxt function that can be used to save numpy arrays to a text file in e.g. csv format. There is an option to change the delimiter. The example below shows writing a csv file using this function with space as a delimiter. The fmt argument is used to specify the format in which to write the data.

np.savetxt("test2.csv",sub_matrix, delimiter=" ", fmt='%d')
# %load test2.csv
2 4 8
3 9 27
4 16 64

11.3 Slicing multidimensional arrays

Just like we can splice a string or list, slicing of numpy arrays can also be performed. While in case of a sting or a list the slice operator takes the start and end positions, in the case of ndarrays, the slice operator would take as many start and end combinations as the dimensions of the ndarray. Which means that slicing can be performed for each dimension of a multidimensional ndarray. In the example below, we first create a three dimensional array having values from 0 to 27. Note that the zeroth element of this 3D array is a 2D array (with values 0 to 8). Similarly, the first element of this 3D array is another 2D array (with values 9 to 17). Next, using slicing, we’ll print one of the number of from this 3D array.

array_3d = np.arange(0,27).reshape(3,3,3)
print(array_3d)
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
print(array_3d.shape)
print(array_3d[0,2,1])
(3, 3, 3)
7

The figure below shows the parsing of the slice operator for a 3D array.

print(array_3d[:,:,0])
[[ 0  3  6]
 [ 9 12 15]
 [18 21 24]]

Quiz: What would be the slice operator for array_3d to get the following output
[[[10 11]
  [13 14]]
 [[19 20]
  [22 23]]]

Show answer
print(array_3d[1:3,:2,1:3])

11.4 Combining arrays

The Numpy arrays can be combined using concatenate function. All the arrays that are to be combined must have same number of dimensions. The arrays to be joined as passed as a tuple and an axis can be specified to indicate the direction along which to join the arrays. The arrays can be joined along row-wise or column-wise by specifing axis as 0 or 1, respectively. Note that the dimensions for all the arrays along the concatenation axis must be identical. That is to say that, e.g., when combining two or more arrays along rows - the number of rows in all the arrays should be same.

print(np.arange(1,5))
[1 2 3 4]
arr1 = np.arange(1,5).reshape(2,2)
arr2 = np.array([[5,6]]) #note the square brackets

print(arr1.shape)
print(arr2.shape)
(2, 2)
(1, 2)
arr3 = np.concatenate((arr1,arr2), axis=0) 
print(arr3)
[[1 2]
 [3 4]
 [5 6]]

The above code won’t work with axis=1 because the number of rows in the two arrays is not same. We can transpose the arr2 and then concatenate it with arr1 along axis=1.

arr4 = np.concatenate((arr1, arr2.T), axis=1) #note the .T
print(arr4)
[[1 2 5]
 [3 4 6]]

Similar, output can be generated using the vstack and hstack functions as shown below.

print(np.vstack((arr1,arr2)))
print(np.hstack((arr1,arr2.T)))
[[1 2]
 [3 4]
 [5 6]]
[[1 2 5]
 [3 4 6]]

11.5 Broadcasting

When performing arithmetic operations with arrays of different sizes, numpy has a an efficient way of broadcasting the array with lower dimensions to match the dimensions of larger array. Broadcasting can be thought of as creating replicas of the original array. This vectorized array operation is a lot faster than conventional looping in Python.

arr1 = np.arange(1,5).reshape(2,2)
arr2 = np.array([[5,6]]) #note the square brackets
print(arr1)
print(arr2)
[[1 2]
 [3 4]]
[[5 6]]

When we multiply the two arrays (arr1 * arr2) the arr2 would be broadcasted such as its shape would change from (1,2) to (2,2). Now, with this broadcasted array the multiplication would be performed element-wise. Similarly, when with multiple a scaler with a ndarray, the scaler is broadcasted to match the dimensions of the ndarray followed by element-wise multiplication.

print(arr1*arr2) #arr2 would be broadcasted along rows
[[ 5 12]
 [15 24]]
print(arr1*arr2.T) #arr2 would be broadcasted along columns
[[ 5 10]
 [18 24]]
print(arr1*2)
[[2 4]
 [6 8]]

Quiz: Write a program to calculate cube of first 10 natural numbers.

Show answer
print(np.arange(1,11)**3)

This ability to broadcast open up lot of possibilities when working with martices. A simple example is shown be to print table for first ten natural numbers.

num1 = np.arange(1,11).reshape(1,10)
all_ones = np.ones((10,10), dtype=int)
table_10 = all_ones*num1*num1.T
print(table_10)
[[  1   2   3   4   5   6   7   8   9  10]
 [  2   4   6   8  10  12  14  16  18  20]
 [  3   6   9  12  15  18  21  24  27  30]
 [  4   8  12  16  20  24  28  32  36  40]
 [  5  10  15  20  25  30  35  40  45  50]
 [  6  12  18  24  30  36  42  48  54  60]
 [  7  14  21  28  35  42  49  56  63  70]
 [  8  16  24  32  40  48  56  64  72  80]
 [  9  18  27  36  45  54  63  72  81  90]
 [ 10  20  30  40  50  60  70  80  90 100]]
#Print tables as seen in math books
import re
table_arr = np.empty((10,1), dtype="<U200")
for x in num1[0]:
    for y in num1[0]:
        s = f"{x:>2} X {y:>2} = {table_10[x-1,y-1]:<4}"
        table_arr[y-1][0] = table_arr[y-1][0]+s
print(re.sub('[\[\'\]]', ' ', np.array_str(table_arr)))
    1 X  1 = 1    2 X  1 = 2    3 X  1 = 3    4 X  1 = 4    5 X  1 = 5    6 X  1 = 6    7 X  1 = 7    8 X  1 = 8    9 X  1 = 9   10 X  1 = 10    
    1 X  2 = 2    2 X  2 = 4    3 X  2 = 6    4 X  2 = 8    5 X  2 = 10   6 X  2 = 12   7 X  2 = 14   8 X  2 = 16   9 X  2 = 18  10 X  2 = 20    
    1 X  3 = 3    2 X  3 = 6    3 X  3 = 9    4 X  3 = 12   5 X  3 = 15   6 X  3 = 18   7 X  3 = 21   8 X  3 = 24   9 X  3 = 27  10 X  3 = 30    
    1 X  4 = 4    2 X  4 = 8    3 X  4 = 12   4 X  4 = 16   5 X  4 = 20   6 X  4 = 24   7 X  4 = 28   8 X  4 = 32   9 X  4 = 36  10 X  4 = 40    
    1 X  5 = 5    2 X  5 = 10   3 X  5 = 15   4 X  5 = 20   5 X  5 = 25   6 X  5 = 30   7 X  5 = 35   8 X  5 = 40   9 X  5 = 45  10 X  5 = 50    
    1 X  6 = 6    2 X  6 = 12   3 X  6 = 18   4 X  6 = 24   5 X  6 = 30   6 X  6 = 36   7 X  6 = 42   8 X  6 = 48   9 X  6 = 54  10 X  6 = 60    
    1 X  7 = 7    2 X  7 = 14   3 X  7 = 21   4 X  7 = 28   5 X  7 = 35   6 X  7 = 42   7 X  7 = 49   8 X  7 = 56   9 X  7 = 63  10 X  7 = 70    
    1 X  8 = 8    2 X  8 = 16   3 X  8 = 24   4 X  8 = 32   5 X  8 = 40   6 X  8 = 48   7 X  8 = 56   8 X  8 = 64   9 X  8 = 72  10 X  8 = 80    
    1 X  9 = 9    2 X  9 = 18   3 X  9 = 27   4 X  9 = 36   5 X  9 = 45   6 X  9 = 54   7 X  9 = 63   8 X  9 = 72   9 X  9 = 81  10 X  9 = 90    
    1 X 10 = 10   2 X 10 = 20   3 X 10 = 30   4 X 10 = 40   5 X 10 = 50   6 X 10 = 60   7 X 10 = 70   8 X 10 = 80   9 X 10 = 90  10 X 10 = 100    

11.6 Functions for ndarrays

Numpy comes with a lot of useful functions that facilitates working with ndarrays. Readers are encourages to refer to the NumPy documentation for details.

where()

The where function is used to get the indcies of values based on certain condition(s). The get the values corresponding to those indcies, we can slice the original array with the output indcies.

arr1 = np.linspace(0,100,10, dtype='int')
print("arr1 indcies where the value is greater than 50:")
print(np.where(arr1 > 50))
print("arr1 elements where the value is greater than 50:")
print(arr1[np.where(arr1 > 50)])
arr1 indcies where the value is greater than 50:
(array([5, 6, 7, 8, 9], dtype=int64),)
arr1 elements where the value is greater than 50:
[ 55  66  77  88 100]

sort()

The values in a numpy array can be sorted along different axes using the sort function. Also, all the values can be sorted after flattening the array by passing the None to the axis argument. This function returns the sorted array and the original array remains unchanged.

arr2 = np.vstack((arr1,np.linspace(100,1,10, dtype='int')))
print(arr2)
[[  0  11  22  33  44  55  66  77  88 100]
 [100  89  78  67  56  45  34  23  12   1]]
print("Sorting row-wise")
print(np.sort(arr2, axis=1))
print("Sorting column-wise")
print(np.sort(arr2, axis=0))
Sorting row-wise
[[  0  11  22  33  44  55  66  77  88 100]
 [  1  12  23  34  45  56  67  78  89 100]]
Sorting column-wise
[[  0  11  22  33  44  45  34  23  12   1]
 [100  89  78  67  56  55  66  77  88 100]]
print("Sorting the flattened array")
print(np.sort(arr2,axis=None))
Sorting the flattened array
[  0   1  11  12  22  23  33  34  44  45  55  56  66  67  77  78  88  89
 100 100]

Statistics

Below are examples of some of the functions available in numpy for order and descriptive statistics. The axis argument can be used to calculate statistics along a specific dimension.

#calculate range (max - min)
print(np.ptp(arr2,axis=0))
[100  78  56  34  12  10  32  54  76  99]
#calculate 50-th percentile
print(np.percentile(arr2,50, axis=0))
[50.  50.  50.  50.  50.  50.  50.  50.  50.  50.5]
#calculate mean and standard deviation
print(np.mean(arr2, axis=0))
print(np.std(arr2, axis=0))
[50.  50.  50.  50.  50.  50.  50.  50.  50.  50.5]
[50.  39.  28.  17.   6.   5.  16.  27.  38.  49.5]