SciPy tutorial for beginners
What is SciPy?#
SciPy is an
It is a good alternative to MATLAB and the GNU scientific library in C/C++.
Interesting fact: SciPy stands for Scientific Python. It was created by Travis Oliphant, who also created NumPy. It is pronounced as “Sigh-Pie”.
How to import SciPy#
To use SciPy, remember to install NumPy first. To install, use the following commands.
-
For macOS:
sudo port install py35-scipy py35-numpy -
For Windows:
python3 -m pip install --user numpy scipy -
For Linux:
sudo apt-get install python-scipy python-numpy
SciPy in scientific computing#
SciPy has multiple packages that cover different scientific domains, which are listed below:
Core scientific computing
Mathematical functions and computation
Constants and utilities
Data handling
Signal and image processing
Statistics and data analysis
Spatial data structures and algorithms
These packages are imported exclusively before being used in the code.
Note that this blog will discuss basic functionalities that are easier for beginner-level audiences to understand. For this purpose, we can list down the relevant subdomains as shown in the figure below.
Feel free to jump to any section you’re interested in.
Physical and mathematical constants#
Constants and units are the building blocks of scientific measurement. Constants define the fundamental behavior of the universe, e.g., the speed of light. Knowing their values helps scientists make predictions. Similarly, units help scientists standardize their measurements, e.g., meters and kilometers. It allows researchers to replicate experiments and verify findings.
Here’s an analogy: Constants are the ingredients, while units are the measuring cups. To bake a good cake, you need to follow the recipe precisely.
The scipy.constants subpackage contains multiple constants. You can access the value of any supported constant as below.
Lines 2–3: We import the required libraries.
Lines 5–6: We print the value of two constants, π and
.
You can also use the units of physical quantities under different unit systems.
Lines 2–3: We import the required libraries.
Lines 6–7: We print the values of metric prefixes specified in powers of 10.
Lines 10–11: We print the values of time units specified in seconds.
To view the complete list of supported constants and units, try the dir() function as follows:
Linear algebra#
Linear algebra is an adapter that connects mathematics and science to solve real-world problems. Many problems boil down to manipulating linear systems of equations. You can almost present any data in the form of matrices or vectors.
The scipy.linalg subpackage helps you solve advanced linear algebra routines and matrix decompositions.
Solving a system of linear equations#
Lines 2–4: We import the required libraries.
Lines 7–8: We store the following system of
using thelinear equations Equations that have a degree of 1 numpy.arrayfunctionality.
Line 9: The
linalg.solve()function accepts the matrices and returns the list of unknown variables, which are 0.2 and 0.4 in this case.
Computing the determinant of a matrix#
Calculating the determinant is one of the prime operations done on a matrix. We can compute the determinant as follows.
Lines 2–4: We import the required libraries.
Line 7: The
linalg.det()function takes the square matrix (created on line 6) and returns its determinant.
Computing the inverse of a matrix#
Computing the inverse of a matrix on a piece of paper is a lengthy process with multiple steps. But with the scipy.linalg subpackage, we can get the result in one step as follows.
Lines 2–4: We import the required libraries.
Line 7: The
linalg.inv()function takes the matrix (created on line 6) and returns its inverse.
Optimization#
Optimizing means finding the best solution to a problem according to certain criteria. In real-world scenarios, we want to minimize costs and maximize profits. For instance, an engineer might optimize a design to minimize material usage while maintaining strength.
Imagine a function as a landscape with hills and valleys. Optimizing the function is like finding the highest peak (maximum) or the lowest valley (minimum) in that landscape. This point (whether maximum or minimum) represents the optimal solution to your problem.
With the scipy.optimize subpackage, you can minimize or maximize the objective function.
Line 2: We import the required libraries.
Lines 5–6: We define an objective quadratic function.
Line 8: The
optimize.minimize()function takes:The function whose minimum value we’re seeking
An initial guess, which is 0 in this case
The output shows that for x equals -0.5, the minimum value of the function is 1.75.
Integration#
In calculus, integration means finding the area under a curve. In scientific computing, integration can be used to compute a function’s total accumulated value over an interval. Imagine velocity as a function of time. Integration of that function gives you the total distance traveled over that time.
Here’s an analogy: Imagine rain falling at a certain rate throughout the day. Integration helps you calculate the total amount of rainwater collected (accumulation) over that entire day.
With scipy.integrate, you can perform single integration as follows.
Line 2: We import the required libraries.
Line 4–5: We declare the
integrandrepresents the mathematical function of one variable we want to integrate.Line 7: The
integrate.quad()function is used to perform the . Here, 0 and 1 are the lower and upper limits of the integration interval.definite integration This refers to the area under a curve between two fixed limits.
For integrating functions of two or more variables, use
dblquad()ortplquad().
Special functions#
SciPy’s special package provides several utility functions that complement the core NumPy operations, such as computing factorial, combinations, and permutations. Look at the code below.
Lines 2–3: We import the required libraries.
Line 6: The
special.factorial()function takes a positive integer as an argument and returns its factorial.Line 12: The
special.comb()function takes two arguments (saynandx) and calculates the number of combinations by choosingxelements from a set ofn.Line 15: The
special.perm()function takes two arguments (saynandx) and calculates the total number of arrangements that can be performed withnelements takenxat a time.
We can also apply trigonometric operations and basic mathematical functionalities. Look at the code below.
Lines 2–3: We import the required libraries.
Line 5: The
special.exp10()function raises the numerical number to the power of 10 and returns the result.Line 7: The
special.exp2()function raises the numerical number to the power of 2 and returns the result.Line 9: The
special.cbrt()function calculates the cube root of a numerical number passed as an argument and returns the result.Line 11: The
special.sindg()function calculates the sine of an angle provided (in degrees unit) as an argument and returns a scalar value.Line 13: The
special.cosdg()function calculates the cosine of an angle provided (in degrees unit) as an argument and returns a scalar value.
For comprehensive statistical functionalities, visit the dedicated
scipy.statssubpackage in the official documentation.
Interpolation#
Interpolation means bridging the gap between known data points by providing estimates for unknown values in between. Imagine tracking stock prices over time. By interpolating, we can estimate potential price movements between recorded points.
With the scipy.interpolate subpackage, you can do 1D (linear) interpolation as follows.
Lines 2–4: We import the required libraries.
Lines 7–8: We define two arrays for known data points. For example, for
, and for , . In other words, the relation between these two variables is as follows: . Line 11: The
interpolate.interp1d()takes both arrays and uses them to estimate corresponding y values based on the provided data points inxandy.Line 15: We generate 50 new values against the x-axis (
x_new) between 0 and 5 (inclusive). These new values represent where we want to estimate the y-axis values.Line 16: We apply the linear interpolation function,
f_linear, to the new values inx_new. This estimates the corresponding y values using linear interpolation between the original data points.Lines 19–25: We plot the interpolation through the
pyplotpackage.
Clustering#
Clustering means dividing the population (or data points) into groups such that the data points in one group are more similar. A group is also known as a cluster.
Imagine a retail company trying to understand its customers better to tailor marketing strategies and improve sales. One way is to segment customers based on purchasing behavior and demographics. In simple words, clustering helps businesses make data-driven decisions.
One of the most commonly used techniques in scipy is
Lines 1–4: We import the required modules.
Lines 6–7: On line 6, we set a seed for a random number generator.
42is an arbitrary number; any integer can be used. The key point is that the seed will produce the same sequence of random numbers. This is useful for debugging. Line 7 generates a 2D array of random numbers. Therand()function generates random numbers over the interval. The arguments specify the array shape, i.e., 10 rows and 2 columns.Line 9: We calculate the distances between each pair of data points in the dataset using the
pdistfunction. The Euclidean distance is one of the most common distance metrics, representing the straight-line distance between two points.Line 10: The
linkagefunction performs hierarchical clustering on the distance matrix using Ward’s method. The outputZis a linkage matrix that contains information about which clusters were merged and at what distance. This information can be used to decide on the final number of clusters.Line 11: It performs hierarchical clustering on the distance matrix using Ward’s method. The output
Zis a linkage matrix that contains information about which clusters were merged and at what distance.Lines 12–13: The
thresholddefines the maximum distance between clusters that will be merged. Clusters formed by merging nodes at distances greater than this threshold will be treated as separate clusters. Thefcluster()function assigns cluster labels to each observation based on the linkage matrixZand the specified criterion, i.e., distance threshold.Line 16: We print the cluster labels for each data point, showing which cluster each point belongs to.
Lines 19–24: We plot the clustering through the
pyplotpackage.
If you want to learn more about SciPy, check the official documentation.