Search⌘ K
AI Features

Built-In Algorithms and Custom Training

Explore how to choose the right SageMaker approach for machine learning tasks from built-in algorithms to custom Script Mode training. Understand the trade-offs between interpretability and performance, and learn how to integrate pretrained and external models to optimize ML workflows on AWS.

Selecting the right algorithm or model architecture for a given ML problem is one of the most consequential decisions in the machine learning life cycle, and it is heavily tested on the AWS Certified Machine Learning Engineer – Associate exam. Within Amazon SageMaker, this decision follows a structured workflow. SageMaker provides multiple pathways, ranging from fully managed built-in algorithms for standard tasks to pretrained model hubs like SageMaker JumpStart and Amazon Bedrock for transfer learning, to Script Mode for writing fully custom training logic in TensorFlow or PyTorch. The decision-making hierarchy follows a managed-first escalation principle: start with the simplest managed option that fits the problem, and escalate to custom training only when managed options fall short.

Attention: A common exam pitfall is selecting custom model development (Script Mode or BYOM) when a built-in algorithm or JumpStart model would solve the problem with far less operational overhead. Always evaluate managed options first.

This lesson walks through each pathway in order of increasing complexity, covers the interpretability vs. performance trade-off that governs algorithm selection, and concludes with how to integrate externally trained models into SageMaker for deployment. Once you make these decisions, the next lesson, Training Jobs and Data Access Patterns, covers how training data flows into these jobs and how compute is optimized.

SageMaker built-in algorithms

SageMaker built-in algorithms are prepackaged, optimized ML implementations that run inside managed containers. They require no custom training code. A data scientist specifies the algorithm, points to training data in Amazon S3, configures hyperparameters, and launches a training job. SageMaker handles container orchestration, distributed training, and hardware optimization automatically.

Key algorithms mapped to problem types

Each built-in algorithm targets a specific category of ML problem, and the exam expects candidates to match algorithms to business scenarios quickly.

  • Linear learner: Supports both regression and binary/multiclass classification on tabular data, producing models with high interpretability because of their linear decision boundaries.

  • XGBoost: A ...