Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

numerical
data
datascience
feature

What are numeric features in data science?

Isra Javaid

Overview

Numerical data are values that can be measured and organized logically. Their characteristics are numbers that describe an object’s various properties.

Data in the real world can exist in different forms like:

  • Numeric data
  • Image data
  • Text data
  • Time-series data

However, to deal with real data in data science, we always convert all types of data type into numerical data.

Different types of data

Numerical features in images

In image data, we have pixel values. Images are stored in machines as a matrix of numbers. The size of this matrix is determined by the number of pixels in each image.

Images can be of two types:

  1. Grayscale images
  2. Color images

Grayscale images have a single matrix of pixels, which has only white, black, and shades of a gray color. The grayscale image has an 8-bit color format.

Color images have three different matrices of RGB (red, green, and blue) channels. In colored images, all colors are shown using RGB with a 24-bit color format.

The value of these pixels can lie between 0 to 255, but what does this number define? The values represent the intensity or brightness of a pixel. Black is characterized by smaller numbers (closer to zero), while white is represented by larger values (closer to 255).

The world vs data scientists

Image representation in Python

Example

The Python code below will take an image from the user and then return the pixel values as the output.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
image = plt.imread('__ed_input.png', format='jpeg')
print(image)

Explanation

  • Line 1–3: We import the required libraries.
  • Line 4: We read the image that the user uploaded.
  • Line 5: We print the numerical values of the pixels.

Numerical features in text data

One of the most common applications for machine learning techniques in text analysis. In machine learning, vectorization converts textual data into numerical data. It’s a crucial task because machine learning techniques can’t be used directly on text, as they only support numerical input.

Example

Let’s take a look at the code below.

main.py
text.txt
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()
filedata=open('text.txt','r')
text=[]
for x in filedata.readlines():
    text.append(x)
vect.fit(text)
train = vect.transform(text)
print(train.toarray())

Explanation

  • Line 1: We import the required libraries.
  • Line 2: We call the CountVectorizer function.
  • Line 3: We open a text file.
  • Line 4: We make an empty array.
  • Line 5–6: We append values in the array.
  • Line 7–8: We transform the text into vector values.
  • Line 9: We print the numeric values of the text.

RELATED TAGS

numerical
data
datascience
feature

CONTRIBUTOR

Isra Javaid
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring