Exploring NumPy: An Essential Tool for Data Analysis

 


numpy

What is NumPy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

How to Install NumPy Library

Standard Python distribution doesn’t come bundled with NumPy module. A lightweight alternative is to install NumPy using the popular Python package installer, pip.

pip install numpy

NumPy − ndarray Object

The most important object defined in NumPy is an N-dimensional array type called ndarray. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. Every item in an ndarray takes the same size of block in the memory. Each element in ndarray is an object of data-type object (called dtype).

example:

import numpy as np

data= np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
print(data)
print(type(data)) # built-in Python function to get type of data
print(data.shape) # tuple indicating the size of each dimension
print(data.dtype) # an object describing the data type of the array


Output:
[[1.5 -0.1 3. ]
[0. -3. 6.5]]
<class 'numpy.ndarray'>
(2, 3)
float64

Creating ndarrays

There are several ways to create an array usingNumPy. The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. Other ways include;

asarray ,ones,zeros,arange,empty,full,identity

import numpy as np

data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1) # output: [6. , 7.5, 8. , 0. , 1. ]
np.zeros(10) # Output: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
np.zeros((3, 6)) # Output: array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]
)
np.empty((2, 3, 2))
np.arange(5) # output: array([ 0, 1, 2, 3, 4])

Data Types for ndarrays

NumPy supports a much greater variety of numerical types than Python does. NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. The dtypes are available as np.bool_, np.float32, etc.

arr = np.array([1, 2, 3], dtype=np.float64)
arr.dtype # output: dtype('float64')

Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays apply the operation element-wise

import numpy as np

# Defining both the matrices
a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# addition
add = a + b
print(add)

# subtraction
sub = a - b -1
print(sub)

# division
div = a / b
print(div)

# multiplication
mult = a * b
print(mult)

# Getting mean of all numbers in 'a'
mean_a = np.mean(a)
print(mean_a)

# Getting average of all numbers in 'b'
mean_b = np.average(b)
print(mean_b)

# Getting sum of all numbers in 'a'
sum_a = np.sum(a)
print(sum_a)

# Getting variance of all number in 'b'
var_b = np.var(b)
print(var_b)



NumPy Array Indexing and Slicing

Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

Slicing in python means taking elements from one given index to another given index.

import numpy as np

# I-D array
arr1 = np.array([1, 2, 3, 4])

print(arr[0]) # output: 1
# slice elements from index 1 to index 3
print(arr1[1:3]) # output 2,3

# 2-D array
arr2 = np.array([[1,2,3,4,5],[6,7,8,9,10]])

print(arr2[0, 1]) # output: 2

# From the second element, slice elements from index 1 to index 4:
print(arr[1, 1:4]) # output: [7 8 9]

NumPy Array Reshaping

Reshaping means changing the shape of an array. e.g.

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

# Reshape From 1-D to 2-D
new_2d_arr = arr.reshape(4, 3)

print(new_2d_arr)

output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]


# Reshape From 1-D to 3-D
new_3d_arr = arr.reshape(2, 3, 2)

print(new_3d_arr)

output:
[[[ 1 2]
[ 3 4]
[ 5 6]]


[[ 7 8]
[ 9 10]
[11 12]]
]

Generate Random Number

NumPy offers the random module to work with random numbers.

from numpy import random

# Generate a 1-D array
x=random.randint(100, size=(5))

print(x)

output:
[72 78 49 4 23]

# Generate a 2-D array
y = random.randint(100, size=(3, 5))

print(y)

output:
[[90 99 11 30 34]
[66 40 63 36 37]
[63 35 89 51 58]]


# Generate a random normal distribution of size 2x3
nomal_x = random.normal(size=(2, 3))

print(normal_x)

output:
[[-1.02873892 0.75722637 -0.77702879]
[-0.17220486 0.62777868 0.06830624]]

Mathematical and Statistical Methods using NumPy

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. You can use aggregations (sometimes called reductions) like sum, mean, and std (standard deviation) either by calling the array instance method or using the top-level NumPy function. When you use the NumPy function, like numpy.sum, you have to pass the array you want to aggregate as the first argument.

import numpy as np

a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# Getting mean of all numbers in 'a'
mean_a = np.mean(a)
print(mean_a)

# Getting average of all numbers in 'b'
mean_b = np.average(b)
print(mean_b)

# Getting sum of all numbers in 'a'
sum_a = np.sum(a)
print(sum_a)

# Getting variance of all number in 'b'
var_b = np.var(b)
print(var_b)

output:
47.5
11.75
190
119.1875

NumPy Linear Algebra

Numpy provides the following functions to perform the different algebraic calculations on the input data. these functions include;

dot() , vdot() ,matmul() ,inner() ,det() ,solve() ,inv() ,

import numpy as np  

a = np.array([[100,200],[23,12]])
b = np.array([[10,20],[12,21]])
c = np.array([[1,2],[3,4]])

dot = np.dot(a,b)
mult = np.matmul(a,b)
print(dot)

print(np.linalg.det(c))
print(mult)

output:

[[3400 6200]
[ 374 712]]

-2.0000000000000004
[[3400 6200]
[ 374 712]]


Conclusion

NumPy is one of the most fundamental libraries in Machine Learning and Data Science. It’s coded in Python and it uses vectorized forms to perform calculations at an incredible speed. It supports various built-in functions that come in handy for many programmers.

Comments

Popular Posts