Exploring NumPy: An Essential Tool for Data Analysis
What is NumPy?
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
How to Install NumPy Library
Standard Python distribution doesn’t come bundled with NumPy module. A lightweight alternative is to install NumPy using the popular Python package installer, pip.
pip install numpy
NumPy − ndarray Object
The most important object defined in NumPy is an N-dimensional array type called ndarray. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. Every item in an ndarray takes the same size of block in the memory. Each element in ndarray is an object of data-type object (called dtype).
example:
import numpy as np
data= np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
print(data)
print(type(data)) # built-in Python function to get type of data
print(data.shape) # tuple indicating the size of each dimension
print(data.dtype) # an object describing the data type of the array
Output:
[[1.5 -0.1 3. ]
[0. -3. 6.5]]
<class 'numpy.ndarray'>
(2, 3)
float64
Creating ndarrays
There are several ways to create an array usingNumPy. The easiest way to create an array is to use the array
function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. Other ways include;
asarray
,ones
,zeros
,arange
,empty
,full
,identity
import numpy as np
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1) # output: [6. , 7.5, 8. , 0. , 1. ]
np.zeros(10) # Output: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
np.zeros((3, 6)) # Output: array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
np.empty((2, 3, 2))
np.arange(5) # output: array([ 0, 1, 2, 3, 4])
Data Types for ndarrays
NumPy supports a much greater variety of numerical types than Python does. NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. The dtypes are available as np.bool_, np.float32, etc.
arr = np.array([1, 2, 3], dtype=np.float64)
arr.dtype # output: dtype('float64')
Arithmetic with NumPy Arrays
Arrays are important because they enable you to express batch operations on data without writing any for
loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays apply the operation element-wise
import numpy as np
# Defining both the matrices
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
# addition
add = a + b
print(add)
# subtraction
sub = a - b -1
print(sub)
# division
div = a / b
print(div)
# multiplication
mult = a * b
print(mult)
# Getting mean of all numbers in 'a'
mean_a = np.mean(a)
print(mean_a)
# Getting average of all numbers in 'b'
mean_b = np.average(b)
print(mean_b)
# Getting sum of all numbers in 'a'
sum_a = np.sum(a)
print(sum_a)
# Getting variance of all number in 'b'
var_b = np.var(b)
print(var_b)
NumPy Array Indexing and Slicing
Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.
Slicing in python means taking elements from one given index to another given index.
import numpy as np
# I-D array
arr1 = np.array([1, 2, 3, 4])
print(arr[0]) # output: 1
# slice elements from index 1 to index 3
print(arr1[1:3]) # output 2,3
# 2-D array
arr2 = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(arr2[0, 1]) # output: 2
# From the second element, slice elements from index 1 to index 4:
print(arr[1, 1:4]) # output: [7 8 9]
NumPy Array Reshaping
Reshaping means changing the shape of an array. e.g.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
# Reshape From 1-D to 2-D
new_2d_arr = arr.reshape(4, 3)
print(new_2d_arr)
output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
# Reshape From 1-D to 3-D
new_3d_arr = arr.reshape(2, 3, 2)
print(new_3d_arr)
output:
[[[ 1 2]
[ 3 4]
[ 5 6]]
[[ 7 8]
[ 9 10]
[11 12]]]
Generate Random Number
NumPy offers the random
module to work with random numbers.
from numpy import random
# Generate a 1-D array
x=random.randint(100, size=(5))
print(x)
output:
[72 78 49 4 23]
# Generate a 2-D array
y = random.randint(100, size=(3, 5))
print(y)
output:
[[90 99 11 30 34]
[66 40 63 36 37]
[63 35 89 51 58]]
# Generate a random normal distribution of size 2x3
nomal_x = random.normal(size=(2, 3))
print(normal_x)
output:
[[-1.02873892 0.75722637 -0.77702879]
[-0.17220486 0.62777868 0.06830624]]
Mathematical and Statistical Methods using NumPy
A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. You can use aggregations (sometimes called reductions) like sum
, mean
, and std
(standard deviation) either by calling the array instance method or using the top-level NumPy function. When you use the NumPy function, like numpy.sum
, you have to pass the array you want to aggregate as the first argument.
import numpy as np
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])
# Getting mean of all numbers in 'a'
mean_a = np.mean(a)
print(mean_a)
# Getting average of all numbers in 'b'
mean_b = np.average(b)
print(mean_b)
# Getting sum of all numbers in 'a'
sum_a = np.sum(a)
print(sum_a)
# Getting variance of all number in 'b'
var_b = np.var(b)
print(var_b)
output:
47.5
11.75
190
119.1875
NumPy Linear Algebra
Numpy provides the following functions to perform the different algebraic calculations on the input data. these functions include;
dot()
, vdot()
,matmul()
,inner()
,det()
,solve()
,inv()
,
import numpy as np
a = np.array([[100,200],[23,12]])
b = np.array([[10,20],[12,21]])
c = np.array([[1,2],[3,4]])
dot = np.dot(a,b)
mult = np.matmul(a,b)
print(dot)
print(np.linalg.det(c))
print(mult)
output:
[[3400 6200]
[ 374 712]]
-2.0000000000000004
[[3400 6200]
[ 374 712]]
Conclusion
NumPy is one of the most fundamental libraries in Machine Learning and Data Science. It’s coded in Python and it uses vectorized forms to perform calculations at an incredible speed. It supports various built-in functions that come in handy for many programmers.
Comments