What are numeric features in data science?
Overview
Numerical data are values that can be measured and organized logically. Their characteristics are numbers that describe an object’s various properties.
Data in the real world can exist in different forms like:
- Numeric data
- Image data
- Text data
- Time-series data
However, to deal with real data in data science, we always convert all types of data type into numerical data.
Numerical features in images
In image data, we have pixel values. Images are stored in machines as a matrix of numbers. The size of this matrix is determined by the number of pixels in each image.
Images can be of two types:
- Grayscale images
- Color images
Grayscale images have a single matrix of pixels, which has only white, black, and shades of a gray color. The grayscale image has an 8-bit color format.
Color images have three different matrices of RGB (red, green, and blue) channels. In colored images, all colors are shown using RGB with a 24-bit color format.
The value of these pixels can lie between 0 to 255, but what does this number define? The values represent the intensity or brightness of a pixel. Black is characterized by smaller numbers (closer to zero), while white is represented by larger values (closer to 255).
Image representation in Python
Example
The Python code below will take an image from the user and then return the pixel values as the output.
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimage = plt.imread('__ed_input.png', format='jpeg')print(image)
Explanation
- Line 1–3: We import the required libraries.
- Line 4: We read the image that the user uploaded.
- Line 5: We print the numerical values of the pixels.
Numerical features in text data
One of the most common applications for machine learning techniques in text analysis. In machine learning, vectorization converts textual data into numerical data. It’s a crucial task because machine learning techniques can’t be used directly on text, as they only support numerical input.
Example
Let’s take a look at the code below.
from sklearn.feature_extraction.text import CountVectorizervect = CountVectorizer()filedata=open('text.txt','r')text=[]for x in filedata.readlines():text.append(x)vect.fit(text)train = vect.transform(text)print(train.toarray())
Explanation
- Line 1: We import the required libraries.
- Line 2: We call the
CountVectorizerfunction. - Line 3: We open a text file.
- Line 4: We make an empty array.
- Line 5–6: We append values in the array.
- Line 7–8: We transform the text into vector values.
- Line 9: We print the numeric values of the text.
Free Resources