ML Model Development

Learn to select appropriate SageMaker algorithms for diverse tasks like recommendations and classification, optimize training costs and model performance, deploy models effectively, and implement monitoring techniques. Understand practical approaches to handle imbalanced data, hyperparameter tuning, model compression, and evaluation to prepare for the MLA-C01 exam and real-world ML engineering demands.

We'll cover the following...

Question 19
Question 20
Question 21
Question 22
Case Study
Question 26
Question 27
Question 28
Question 29
Question 30
Question 31
Question 32
Question 33
Question 34
Question 35

Question 19

A retail company wants to build a product recommendation system. They have millions of user-item interaction records stored in Amazon S3, represented as a sparse interaction matrix. The ML engineer needs to choose the most appropriate Amazon SageMaker built-in algorithm to power the recommendation engine.

Which SageMaker built-in algorithm should the ML engineer select?

A. Amazon SageMaker BlazingText, because it can process product description text to generate recommendations

B. Amazon SageMaker Factorization Machines, because it is purpose-built for recommendation tasks involving sparse interaction data

C. Amazon SageMaker k-nearest neighbors (k-NN), because it can find similar users based on interaction history

D. A custom deep learning neural collaborative filtering model trained on SageMaker using a custom training script

Question 20

An ML engineer is training a 130-billion-parameter large language model and needs to minimize training costs on AWS. They are evaluating different instance types in Amazon SageMaker for this training workload. The model requires distributed training across multiple accelerators.

Which approach should the ML engineer choose to minimize training costs while maintaining comparable training performance?

A. Use ml.trn1.32xlarge instances powered by AWS Trainium accelerators for the distributed training job

B. Use ml.m5.xlarge instances with SageMaker distributed training to reduce per-instance costs

C. Use ml.p5.48xlarge GPU instances with SageMaker managed spot training to reduce costs

D. Reduce the model size to fit on a single ml.p4d.24xlarge GPU instance to avoid distributed training overhead

Question 21

A financial services company has built a fraud detection model using XGBoost in a local Jupyter environment outside of Amazon SageMaker. The model has been validated and performs well. The ML engineer now needs to deploy this externally trained model to SageMaker for production inference and register it for model versioning.

What is the correct approach to bring this externally trained model into SageMaker?

A. Retrain the model from scratch using the SageMaker built-in XGBoost algorithm to ensure compatibility with SageMaker infrastructure

B. Package the model artifact as a model.tar.gz file, upload it to Amazon S3, create a SageMaker model object with a compatible XGBoost inference container, and register it in SageMaker Model Registry

C. Use SageMaker Autopilot to automatically recreate and optimize the fraud detection model from the original training data

D. Directly upload the serialized pickle file to a SageMaker real-time endpoint without any containerization or model packaging

ML Model Development

Question 19

Question 20

Question 21

Question 22