Search⌘ K
AI Features

Feature Engineering and Categorical Variables Encoding

Explore essential feature engineering methods including missing value imputation and various categorical encoding techniques. Understand one-hot, ordinal, count, and target mean encoding to prepare your data effectively for machine learning.

Feature engineering

Feature engineering helps us build complex models using the preprocessed features at hand. Feature selection involves selecting a subset of preprocessed features to build the model, and these steps are included in every model-building pipeline.

Missing values

Features of the input dataset can contain missing values for certain reasons. Filling in the missing values or, perhaps, throwing out the features or instances with a large number of missing values is an important part of the pipeline. Data imputation is the technique used for estimating the missing values.

Dealing with missing values

  • Drop the instances: The first technique is to drop the instances or features with at least one missing value. The variation in it can be attributed to the instances in which values are missing in any of the defined features.

  • Mean or median imputation: This refers to replacing the missing value with the mean or median of the respective feature. Mean or median is calculated on the training dataset, and it is also used in the test dataset if the values are missing.

  • Mode or frequent category imputation: This imputation is used mostly for categorical variables and involves replacing the missing values with the most common value in the feature.

  • Arbitrary number imputation: This involves replacing the missing value with an arbitrary value. The most commonly used values for numerical features are 999, 9999, or -1. In case of a missing value for ...