Search⌘ K
AI Features

ML.NET Project Structure

Explore the detailed structure of an ML.NET project by examining autogenerated files involved in building, training, and consuming a binary classification model for sentiment analysis. Understand how pipeline building and model input/output classes work to help you customize and deploy ML.NET solutions effectively.

The interactive playground below contains a project that is autogenerated when we build an ML model by using ML.NET. This is a binary classification model that aims to determine whether a specific sentence has a positive or negative sentiment. It was trained by using the yelp_labelled.txt file found in the TrainingData folder.

// This file was auto-generated by ML.NET Model Builder. 
using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.Linq;
using System.IO;
using System.Collections.Generic;
namespace MLApp
{
    public partial class DemoMLModel
    {
        /// <summary>
        /// model input class for DemoMLModel.
        /// </summary>
        #region model input class
        public class ModelInput
        {
            [ColumnName(@"col0")]
            public string Col0 { get; set; }

            [ColumnName(@"col1")]
            public float Col1 { get; set; }

        }

        #endregion

        /// <summary>
        /// model output class for DemoMLModel.
        /// </summary>
        #region model output class
        public class ModelOutput
        {
            [ColumnName(@"col0")]
            public float[] Col0 { get; set; }

            [ColumnName(@"col1")]
            public uint Col1 { get; set; }

            [ColumnName(@"Features")]
            public float[] Features { get; set; }

            [ColumnName(@"PredictedLabel")]
            public float PredictedLabel { get; set; }

            [ColumnName(@"Score")]
            public float[] Score { get; set; }

        }

        #endregion

        private static string MLNetModelPath = Path.GetFullPath("/models/DemoSentimentMLModel.zip");

        public static readonly Lazy<PredictionEngine<ModelInput, ModelOutput>> PredictEngine = new Lazy<PredictionEngine<ModelInput, ModelOutput>>(() => CreatePredictEngine(), true);

        /// <summary>
        /// Use this method to predict on <see cref="ModelInput"/>.
        /// </summary>
        /// <param name="input">model input.</param>
        /// <returns><seealso cref=" ModelOutput"/></returns>
        public static ModelOutput Predict(ModelInput input)
        {
            var predEngine = PredictEngine.Value;
            return predEngine.Predict(input);
        }

        private static PredictionEngine<ModelInput, ModelOutput> CreatePredictEngine()
        {
            var mlContext = new MLContext();
            ITransformer mlModel = mlContext.Model.Load(MLNetModelPath, out var _);
            return mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(mlModel);
        }
    }
}
Basic ML.NET project

Note: The original model generated by the CLI command had nullable warnings when the application was executed. Although these warnings did not affect the execution of the application, they were disabled to keep the console output clean. This was done by changing the value of the Nullable element in the MLApp.csproj file from enable to disable.

The Program.cs file has been added so we can consume the trained model. It uses an interactive console interface where we can type sentences. For each sentence we type, the model will tell us whether the sentiment is positive or negative.

If we build and run the application by clicking the “Run” button, we'll be able to enter any arbitrary sentence into the console. The model will then attempt to predict whether the sentence has a positive or negative sentiment.

To make this work, our project has a reference to the Microsoft.ML NuGet package. This can be found in line 9 of the MLApp.csproj file.

In the example above, the autogenerated files are as follows:

  • DemoMLModel.mbconfig

  • DemoMLModel.training.cs

  • DemoMLModel.zip

  • DemoMLModel.consumption.cs

The DemoMLModel part in each of the file names is the name we gave to the model during its training. Otherwise, these files are common to any autogenerated ML.NET model. Let’s examine the role of each.

Setting file

This DemoMLModel.mbconfig file contains settings that were applied while training the model. They're represented in JSON format. The settings include task selection, training time, the algorithms used, and so on. These settings can be changed if we want to retrain the model.

Trained model

The DemoMLModel.zip file represents the actual trained model. Even a simple model has a relatively complex structure and consists of multiple files and folders. This is why it's presented as a ZIP archive. ML.NET doesn’t need this archive to be unpacked before the model can be consumed. It's capable of working with raw ZIP files.

Training logic

The DemoMLModel.training.cs file contains the code for training the model. The code can also be used in the retraining of existing models. This file contains the BuildPipeline() method, which is used for building a training pipeline. In our example, it can be found in line 34 of the DemoMLModel.training.cs file. The method definition is as follows:

C#
public static IEstimator<ITransformer> BuildPipeline(MLContext mlContext)
{
// Data process configuration with pipeline data transformations
var pipeline = mlContext.Transforms.Text.FeaturizeText(inputColumnName:@"col0",outputColumnName:@"col0")
.Append(mlContext.Transforms.Concatenate(@"Features", new []{@"col0"}))
.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName:@"col1",inputColumnName:@"col1"))
.Append(mlContext.Transforms.NormalizeMinMax(@"Features", @"Features"))
.Append(mlContext.MulticlassClassification.Trainers.OneVersusAll(binaryEstimator: mlContext.BinaryClassification.Trainers.LbfgsLogisticRegression(new LbfgsLogisticRegressionBinaryTrainer.Options(){L1Regularization=0.0336147F,L2Regularization=0.8686289F,LabelColumnName=@"col1",FeatureColumnName=@"Features"}), labelColumnName:@"col1"))
.Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName:@"PredictedLabel",inputColumnName:@"PredictedLabel"));
return pipeline;
}

Here’s what’s happening in the training pipeline:

  • Line 4: We featurize the input column. ML process works by doing numeric calculations; therefore, we need to convert a string input into a numeric output. We need to do this for any string feature column.

  • Line 5: We add a step that lists all the feature columns. Features are all columns that we use in training other than the label column. In this step, we list them all. In our case, we only have one, col0.

  • Line 7: We add a step where we map a label column to a format the ML algorithm can understand.

  • Line 8: We normalize the minimum and maximum values of the features. This is done to make the calculations easier.

  • Line 9: We specify the training algorithm and pass appropriate parameters into it.

  • Line 10: We extract the predicted label value in its original format. For example, if the original label values were textual, the ML pipeline would have converted them to numbers. This step performs the reverse conversion.

We also have the RetrainPipeline method in line 21 in our example. The content of this method is as follows:

C#
public static ITransformer RetrainPipeline(MLContext mlContext, IDataView trainData)
{
var pipeline = BuildPipeline(mlContext);
var model = pipeline.Fit(trainData);
return model;
}

These two methods are related. The RetrainPipeline() method invokes the BuildPipeline() method to produce a training pipeline. It then passes the input data into the Fit() method on the pipeline object. This is the method that trains the model by using the pipeline. Once the process is completed, a trained model is returned.

Model consumption interface

The DemoMLModel.consumption.cs file provides a public interface to consume the model. It contains the classes that represent the input and output. In the DemoMLModel.consumption.cs in the playground above, we have the input class, ModelInput, defined in line 16. The output class is called ModelOutput and it’s defined n line 32.

Model input provides the original features and the label. Here's what it looks like in our example:

public class ModelInput
{
[ColumnName(@"col0")]
public string Col0 { get; set; }
[ColumnName(@"col1")]
public float Col1 { get; set; }
}
Model input definition

The model output contains all the original columns, but any non-numeric data types would have been converted into numeric data types. But it also has additional columns that provide predicted label results in the same format as was present in the original input and any metadata, such as confidence score. This is what this object looks like in our example:

public class ModelOutput
{
[ColumnName(@"col0")]
public float[] Col0 { get; set; }
[ColumnName(@"col1")]
public uint Col1 { get; set; }
[ColumnName(@"Features")]
public float[] Features { get; set; }
[ColumnName(@"PredictedLabel")]
public float PredictedLabel { get; set; }
[ColumnName(@"Score")]
public float[] Score { get; set; }
}
Model output

Both of these classes use ColumnName attribute on their properties. An example of this attribute can be found in line 18. This attribute allows us to give arbitrary names to the data columns for better readability.

The consumption of the model is performed by the Predict() method that we can find in line 62 in our example. This is what this method consists of:

C#
public static ModelOutput Predict(ModelInput input)
{
var predEngine = PredictEngine.Value;
return predEngine.Predict(input);
}

This method accepts ModelInput as its input parameter and has ModelOutput as its return type. Internally, it calls the Predict() method on the PredictEngine object, which uses the ML model to make a prediction. The ML model reference is held by the private static MLNetModelPath, which we can see in line 58. We can use any arbitrary path, so we aren’t forced to place the model in the same location as our application.

This concludes the overview of an ML.NET project structure and the autogenerated files inside it. Of course, we don’t have to rely on autogenerated files and we can write our own implementation. But these files would still be useful as templates.