SciPy tutorial for beginners

What is SciPy?#

SciPy is an open-sourceThis means that the source code is available for use or modification as users see fit. library in Python used for scientific computing. It is dependent on the NumPy since SciPy uses NumPy arrays to efficiently handle numerical computations. Though NumPy has many mathematical functions, SciPy has optimized them and added other complex functions.

Feel free to jump to any section you’re interested in.

Physical and mathematical constants#

Constants and units are the building blocks of scientific measurement. Constants define the fundamental behavior of the universe, e.g., the speed of light. Knowing their values helps scientists make predictions. Similarly, units help scientists standardize their measurements, e.g., meters and kilometers. It allows researchers to replicate experiments and verify findings.

Here’s an analogy: Constants are the ingredients, while units are the measuring cups. To bake a good cake, you need to follow the recipe precisely.

The scipy.constants subpackage contains multiple constants. You can access the value of any supported constant as below.

Lines 2–4: We import the required libraries.
Line 7: The linalg.inv() function takes the matrix (created on line 6) and returns its inverse.

Optimization#

Optimizing means finding the best solution to a problem according to certain criteria. In real-world scenarios, we want to minimize costs and maximize profits. For instance, an engineer might optimize a design to minimize material usage while maintaining strength.

Imagine a function as a landscape with hills and valleys. Optimizing the function is like finding the highest peak (maximum) or the lowest valley (minimum) in that landscape. This point (whether maximum or minimum) represents the optimal solution to your problem.

With the scipy.optimize subpackage, you can minimize or maximize the objective function.

Line 2: We import the required libraries.
Lines 5–6: We define an objective quadratic function.
Line 8: The optimize.minimize() function takes:
- The function whose minimum value we’re seeking
- An initial guess, which is 0 in this case

The output shows that for x equals -0.5, the minimum value of the function is 1.75.

Integration#

In calculus, integration means finding the area under a curve. In scientific computing, integration can be used to compute a function’s total accumulated value over an interval. Imagine velocity as a function of time. Integration of that function gives you the total distance traveled over that time.

Here’s an analogy: Imagine rain falling at a certain rate throughout the day. Integration helps you calculate the total amount of rainwater collected (accumulation) over that entire day.

With scipy.integrate, you can perform single integration as follows.

Line 2: We import the required libraries.
Line 4–5: We declare the integrand represents the mathematical function of one variable we want to integrate.
Line 7: The integrate.quad() function is used to perform the definite integrationThis refers to the area under a curve between two fixed limits.. Here, 0 and 1 are the lower and upper limits of the integration interval.

For integrating functions of two or more variables, use dblquad() or tplquad().

Special functions#

SciPy’s special package provides several utility functions that complement the core NumPy operations, such as computing factorial, combinations, and permutations. Look at the code below.

Lines 2–3: We import the required libraries.
Line 6: The special.factorial() function takes a positive integer as an argument and returns its factorial.
Line 12: The special.comb() function takes two arguments (say n and x) and calculates the number of combinations by choosing x elements from a set of n.
Line 15: The special.perm() function takes two arguments (say n and x) and calculates the total number of arrangements that can be performed with n elements taken x at a time.

We can also apply trigonometric operations and basic mathematical functionalities. Look at the code below.

Lines 2–3: We import the required libraries.
Line 5: The special.exp10() function raises the numerical number to the power of 10 and returns the result.
Line 7: The special.exp2() function raises the numerical number to the power of 2 and returns the result.
Line 9: The special.cbrt() function calculates the cube root of a numerical number passed as an argument and returns the result.
Line 11: The special.sindg() function calculates the sine of an angle provided (in degrees unit) as an argument and returns a scalar value.
Line 13: The special.cosdg() function calculates the cosine of an angle provided (in degrees unit) as an argument and returns a scalar value.

For comprehensive statistical functionalities, visit the dedicated scipy.stats subpackage in the official documentation.

Interpolation#

Interpolation means bridging the gap between known data points by providing estimates for unknown values in between. Imagine tracking stock prices over time. By interpolating, we can estimate potential price movements between recorded points.

With the scipy.interpolate subpackage, you can do 1D (linear) interpolation as follows.

Python 3.10.4

# Import the required package(s)
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
# Define the known data points
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([0, 1, 4, 9, 16, 25])
# Create the interpolation function
f_linear = interpolate.interp1d(x, y) # Linear interpolation
# Estimate the values
x_new = np.linspace(0, 5, 50)         # Defining x values
y_linear = f_linear(x_new)            # Estimate y values using the interpolation functions
# Plot the results
plt.plot(x, y, 'o', label='Data points')
plt.plot(x_new, y_linear, '-', label='Linear interpolation')
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Interpolation')
plt.savefig('output/graph.png')

Lines 2–4: We import the required libraries.
Lines 7–8: We define two arrays for known data points. For example, for $x=0$ , $y=0$ and for $x = 3$ , $y = 9$ . In other words, the relation between these two variables is as follows: $y=x^2$ .
Line 11: The interpolate.interp1d() takes both arrays and uses them to estimate corresponding y values based on the provided data points in x and y.
Line 15: We generate 50 new values against the x-axis (x_new) between 0 and 5 (inclusive). These new values represent where we want to estimate the y-axis values.
Line 16: We apply the linear interpolation function, f_linear, to the new values in x_new. This estimates the corresponding y values using linear interpolation between the original data points.
Lines 19–25: We plot the interpolation through the pyplot package.

Clustering#

Clustering means dividing the population (or data points) into groups such that the data points in one group are more similar. A group is also known as a cluster.

Imagine a retail company trying to understand its customers better to tailor marketing strategies and improve sales. One way is to segment customers based on purchasing behavior and demographics. In simple words, clustering helps businesses make data-driven decisions.

One of the most commonly used techniques in scipy is hierarchical clusteringThis refers to an unsupervised learning method that builds clusters by measuring the dissimilarities between data points.. Go through its example below.

Python 3.10.4

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, fcluster
from scipy.spatial.distance import pdist
np.random.seed(42)
data = np.random.rand(10, 2)
distance_matrix = pdist(data, 'euclidean')
Z = linkage(distance_matrix, 'ward')
threshold = 0.4
clusters = fcluster(Z, threshold, criterion='distance')
# Print cluster labels
print("Cluster labels:", clusters)
# Plot the clustered data
plt.figure(figsize=(10, 7))
plt.scatter(data[:, 0], data[:, 1], c=clusters, cmap='prism')
plt.title('Data points and their cluster assignments')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.savefig('output/graph.png')

Lines 1–4: We import the required modules.
Lines 6–7: On line 6, we set a seed for a random number generator. 42 is an arbitrary number; any integer can be used. The key point is that the seed will produce the same sequence of random numbers. This is useful for debugging. Line 7 generates a 2D array of random numbers. The rand() function generates random numbers over the interval. The arguments specify the array shape, i.e., 10 rows and 2 columns.
Line 9: We calculate the distances between each pair of data points in the dataset using the pdist function. The Euclidean distance is one of the most common distance metrics, representing the straight-line distance between two points.
Line 10: The linkage function performs hierarchical clustering on the distance matrix using Ward’s method. The output Z is a linkage matrix that contains information about which clusters were merged and at what distance. This information can be used to decide on the final number of clusters.
Line 11: It performs hierarchical clustering on the distance matrix using Ward’s method. The output Z is a linkage matrix that contains information about which clusters were merged and at what distance.
Lines 12–13: The threshold defines the maximum distance between clusters that will be merged. Clusters formed by merging nodes at distances greater than this threshold will be treated as separate clusters. The fcluster() function assigns cluster labels to each observation based on the linkage matrix Z and the specified criterion, i.e., distance threshold.
Line 16: We print the cluster labels for each data point, showing which cluster each point belongs to.
Lines 19–24: We plot the clustering through the pyplot package.

If you want to learn more about SciPy, check the official documentation.

SciPy tutorial for beginners

What is SciPy?#

How to import SciPy#

SciPy in scientific computing#

Physical and mathematical constants#

Linear algebra#

Solving a system of linear equations#

Computing the determinant of a matrix#

Computing the inverse of a matrix#

Optimization#

Integration#

Special functions#

Interpolation#

Clustering#