Spring Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmaspas7

Easiest Solution 2 Pass Your Certification Exams

MLA-C01 Amazon Web Services AWS Certified Machine Learning Engineer - Associate Free Practice Exam Questions (2026 Updated)

Prepare effectively for your Amazon Web Services MLA-C01 AWS Certified Machine Learning Engineer - Associate certification with our extensive collection of free, high-quality practice questions. Each question is designed to mirror the actual exam format and objectives, complete with comprehensive answers and detailed explanations. Our materials are regularly updated for 2026, ensuring you have the most current resources to build confidence and succeed on your first attempt.

Page: 3 / 4
Total 207 questions

An ML engineer wants to re-train an XGBoost model at the end of each month. A data team prepares the training data. The training dataset is a few hundred megabytes in size. When the data is ready, the data team stores the data as a new file in an Amazon S3 bucket.

The ML engineer needs a solution to automate this pipeline. The solution must register the new model version in Amazon SageMaker Model Registry within 24 hours.

Which solution will meet these requirements?

A.

Create an AWS Lambda function that runs one time each week to poll the S3 bucket for new files. Invoke the Lambda function asynchronously. Configure the Lambda function to start the pipeline if the function detects new data.

B.

Create an Amazon CloudWatch rule that runs on a schedule to start the pipeline every 30 days.

C.

Create an S3 Lifecycle rule to start the pipeline every time a new object is uploaded to the S3 bucket.

D.

Create an Amazon EventBridge rule to start an AWS Step Functions TrainingStep every time a new object is uploaded to the S3 bucket.

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data.

Which solution will meet this requirement with the LEAST operational effort?

A.

Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.

B.

Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.

C.

Use AWS Glue DataBrew built-in features to oversample the minority class.

D.

Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

An ML engineer is setting up a continuous integration and continuous delivery (CI/CD) pipeline for an ML workflow in Amazon SageMaker AI. The pipeline needs to automate model re-training, testing, and deployment whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer wants to track model versions for auditing.

Which solution will meet these requirements?

A.

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and to track model versions.

B.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.

Create an AWS Lambda function to re-train and deploy the model. Use Amazon EventBridge to invoke the Lambda function. Reference the Lambda logs to track model versions.

D.

Use SageMaker AI notebook instances to manually re-train and deploy the model when needed. Reference AWS CloudTrail logs to track model versions.

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

A.

Apply principal component analysis (PCA) to oversample the minority class in the training dataset.

B.

Apply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

C.

Randomly oversample the majority class in the validation dataset.

D.

Apply k-means clustering to undersample the minority class in the test dataset.

An ML engineer is building a model to predict house and apartment prices. The model uses three features: Square Meters, Price, and Age of Building. The dataset has 10,000 data rows. The data includes data points for one large mansion and one extremely small apartment.

The ML engineer must perform preprocessing on the dataset to ensure that the model produces accurate predictions for the typical house or apartment.

Which solution will meet these requirements?

A.

Remove the outliers and perform a log transformation on the Square Meters variable.

B.

Keep the outliers and perform normalization on the Square Meters variable.

C.

Remove the outliers and perform one-hot encoding on the Square Meters variable.

D.

Keep the outliers and perform one-hot encoding on the Square Meters variable.

A company needs to run a batch data-processing job on Amazon EC2 instances. The job will run during the weekend and will take 90 minutes to finish running. The processing can handle interruptions. The company will run the job every weekend for the next 6 months.

Which EC2 instance purchasing option will meet these requirements MOST cost-effectively?

A.

Spot Instances

B.

Reserved Instances

C.

On-Demand Instances

D.

Dedicated Instances

A company has an ML model that is deployed to an Amazon SageMaker AI endpoint for real-time inference. The company needs to deploy a new model. The company must compare the new model’s performance to the currently deployed model's performance before shifting all traffic to the new model.

Which solution will meet these requirements with the LEAST operational effort?

A.

Deploy the new model to a separate endpoint. Manually split traffic between the two endpoints.

B.

Deploy the new model to a separate endpoint. Use Amazon CloudFront to distribute traffic between the two endpoints.

C.

Deploy the new model as a shadow variant on the same endpoint as the current model. Route a portion of live traffic to the shadow model for evaluation.

D.

Use AWS Lambda functions with custom logic to route traffic between the current model and the new model.

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

A.

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

B.

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

C.

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

D.

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

A.

CSV files compressed with Snappy

B.

JSON objects in JSONL format

C.

JSON files compressed with gzip

D.

Apache Parquet files

A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

C.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

D.

Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

B.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model's performance? (Select TWO.)

A.

Increase the model complexity by increasing the number of features in the dataset.

B.

Decrease the model complexity by reducing the number of features in the dataset.

C.

Decrease the model complexity by reducing the number of samples in the dataset.

D.

Increase the value of the L2 regularization parameter.

E.

Decrease the value of the L2 regularization parameter.

A company that has hundreds of data scientists is using Amazon SageMaker to create ML models. The models are in model groups in the SageMaker Model Registry.

The data scientists are grouped into three categories: computer vision, natural language processing (NLP), and speech recognition. An ML engineer needs to implement a solution to organize the existing models into these groups to improve model discoverability at scale. The solution must not affect the integrity of the model artifacts and their existing groupings.

Which solution will meet these requirements?

A.

Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

B.

Create a model group for each category. Move the existing models into these category model groups.

C.

Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

D.

Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

A.

Configure the competitor's name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor's name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

A travel company wants to create an ML model to recommend the next airport destination for its users. The company has collected millions of data records about user location, recent search history on the company's website, and 2,000 available airports. The data has several categorical features with a target column that is expected to have a high-dimensional sparse matrix.

The company needs to use Amazon SageMaker AI built-in algorithms for the model. An ML engineer converts the categorical features by using one-hot encoding.

Which algorithm should the ML engineer implement to meet these requirements?

A.

Use the CatBoost algorithm to recommend the next airport destination.

B.

Use the DeepAR forecasting algorithm to recommend the next airport destination.

C.

Use the Factorization Machines algorithm to recommend the next airport destination.

D.

Use the k-means algorithm to cluster users into groups and map each group to the next airport destination.

An ML engineer needs to process thousands of existing CSV objects and new CSV objects that are uploaded. The CSV objects are stored in a central Amazon S3 bucket and have the same number of columns. One of the columns is a transaction date. The ML engineer must query the data based on the transaction date.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.

B.

Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.

C.

Create a new S3 bucket for processed data. Use AWS Glue for Apache Spark to create a job to query the CSV objects based on transaction date. Configure the job to store the results in the new S3 bucket. Query the objects from the new S3 bucket.

D.

Create a new S3 bucket for processed data. Use Amazon Data Firehose to transfer the data from the central S3 bucket to the new S3 bucket. Configure Firehose to run an AWS Lambda function to query the data based on transaction date.

A company has a custom extract, transform, and load (ETL) process that runs on premises. The ETL process is written in the R language and runs for an average of 6 hours. The company wants to migrate the process to run on AWS.

Which solution will meet these requirements?

A.

Use an AWS Lambda function created from a container image to run the ETL jobs.

B.

Use Amazon SageMaker AI processing jobs with a custom Docker image stored in Amazon Elastic Container Registry (Amazon ECR).

C.

Use Amazon SageMaker AI script mode to build a Docker image. Run the ETL jobs by using SageMaker Notebook Jobs.

D.

Use AWS Glue to prepare and run the ETL jobs.

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

A.

Use a multi-model serverless endpoint. Enable caching.

B.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

C.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

D.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

A company is using ML to predict the presence of a specific weed in a farmer's field. The company is using the Amazon SageMaker linear learner built-in algorithm with a value of multiclass_dassifier for the predictorjype hyperparameter.

What should the company do to MINIMIZE false positives?

A.

Set the value of the weight decay hyperparameter to zero.

B.

Increase the number of training epochs.

C.

Increase the value of the target_precision hyperparameter.

D.

Change the value of the predictorjype hyperparameter to regressor.

Page: 3 / 4
Total 207 questions
Copyright © 2014-2026 Solution2Pass. All Rights Reserved