Professional-Machine-Learning-Engineer Google Professional Machine Learning Engineer Free Practice Exam Questions (2025 Updated)

Prepare effectively for your Google Professional-Machine-Learning-Engineer Google Professional Machine Learning Engineer certification with our extensive collection of free, high-quality practice questions. Each question is designed to mirror the actual exam format and objectives, complete with comprehensive answers and detailed explanations. Our materials are regularly updated for 2025, ensuring you have the most current resources to build confidence and succeed on your first attempt.

Google Professional-Machine-Learning-Engineer Premium Access Download Demo

Page: 3 / 5
Total 285 questions

Question # 46

Your organization’s marketing team is building a customer recommendation chatbot that uses a generative AI large language model (LLM) to provide personalized product suggestions in real time. The chatbot needs to access data from millions of customers, including purchase history, browsing behavior, and preferences. The data is stored in a Cloud SQL for PostgreSQL database. You need the chatbot response time to be less than 100ms. How should you design the system?

Use BigQuery ML to fine-tune the LLM with the data in the Cloud SQL for PostgreSQL database, and access the model from BigQuery.

Replicate the Cloud SQL for PostgreSQL database to AlloyDB. Configure the chatbot server to query AlloyDB.

Transform relevant customer data into vector embeddings and store them in Vertex AI Search for retrieval by the LLM.

Create a caching layer between the chatbot and the Cloud SQL for PostgreSQL database to store frequently accessed customer data. Configure the chatbot server to query the cache.

Question # 47

You have recently developed a custom model for image classification by using a neural network. You need to automatically identify the values for learning rate, number of layers, and kernel size. To do this, you plan to run multiple jobs in parallel to identify the parameters that optimize performance. You want to minimize custom code development and infrastructure management. What should you do?

Create a Vertex Al pipeline that runs different model training jobs in parallel.

Train an AutoML image classification model.

Create a custom training job that uses the Vertex Al Vizier SDK for parameter optimization.

Create a Vertex Al hyperparameter tuning job.

Question # 48

You need to build an ML model for a social media application to predict whether a user’s submitted profile photo meets the requirements. The application will inform the user if the picture meets the requirements. How should you build a model to ensure that the application does not falsely accept a non-compliant picture?

Use AutoML to optimize the model’s recall in order to minimize false negatives.

Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.

Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that meet the profile photo requirements.

Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that do not meet the profile photo requirements.

Question # 49

Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading. Which approach should you use?

Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior.

Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity.

Build a logistic regression model for each user that predicts whether an article should be recommended to a user.

Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories.

Explanation:

Option A is incorrect because creating a collaborative filtering system that recommends articles to a user based on the user’s past behavior is not the best approach to suggest articles that are similar to the articles they are currently reading. Collaborative filtering is a method of recommendation that uses the ratings or preferences of other users to predict the preferences of a target user1. However, this method does not consider the content or features of the articles, and may not be able to find articles that are similar in terms of topic, style, or sentiment.

Option B is correct because encoding all articles into vectors using word2vec, and building a model that returns articles based on vector similarity is a suitable approach to suggest articles that are similar to the articles they are currently reading. Word2vec is a technique that learns low-dimensional and dense representations of words from a large corpus of text, such that words that are semantically similar have similar vectors2. By applying word2vec to the articles, we can obtain vector representations of the articles that capture their meaning and usage. Then, we can use a similarity measure, such as cosine similarity, to find articles that have similar vectors to the current article3.

Option C is incorrect because building a logistic regression model for each user that predicts whether an article should be recommended to a user is not a feasible approach to suggest articles that are similar to the articles they are currently reading. Logistic regression is a supervised learning method that models the probability of a binary outcome (such as recommend or not) based on some input features (such as user profile or article content)4. However, this method requires a large amount of labeled data for each user, which may not be available or scalable. Moreover, this method does not directly measure the similarity between articles, but rather the likelihood of a user’s preference.

Option D is incorrect because manually labeling a few hundred articles, and then training an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories is not an effective approach to suggest articles that are similar to the articles they are currently reading. SVM (support vector machine) is a supervised learning method that finds a hyperplane that separates the data into different classes (such as news categories) with the maximum margin5. However, this method also requires a large amount of labeled data, which may be costly and time-consuming to obtain. Moreover, this method does not account for the fine-grained similarity between articles within the same category, or the cross-category similarity between articles from different categories.

References:

Collaborative filtering

Word2vec

Cosine similarity

Logistic regression

SVM

Question # 50

You are training a TensorFlow model on a structured data set with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?

Load the data into BigQuery and read the data from BigQuery.

Load the data into Cloud Bigtable, and read the data from Bigtable

Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage

Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS)

Explanation:

The input/output execution performance of a TensorFlow model depends on how efficiently the model can read and process the data from the data source. Reading and processing data from CSV files can be slow and inefficient, especially if the data is large and distributed. Therefore, to improve the input/output execution performance, one should use a more suitable data format and storage system.

One of the best options for improving the input/output execution performance is to convert the CSV files into shards of TFRecords, and store the data in Cloud Storage. TFRecord is a binary data format that can store a sequence of serialized TensorFlow examples. TFRecord has several advantages over CSV, such as:

Faster data loading: TFRecord can be read and processed faster than CSV, as it avoids the overhead of parsing and decoding the text data. TFRecord also supports compression and checksums, which can reduce the data size and ensure data integrity1

Better performance: TFRecord can improve the performance of the model, as it allows the model to access the data in a sequential and streaming manner, and leverage the tf.data API to build efficient data pipelines. TFRecord also supports sharding and interleaving, which can increase the parallelism and throughput of the data processing2

Easier integration: TFRecord can integrate seamlessly with TensorFlow, as it is the native data format for TensorFlow. TFRecord also supports various types of data, such as images, text, audio, and video, and can store the data schema and metadata along with the data3

Cloud Storage is a scalable and reliable object storage service that can store any amount of data. Cloud Storage has several advantages over other storage systems, such as:

High availability: Cloud Storage can provide high availability and durability for the data, as it replicates the data across multiple regions and zones, and supports versioning and lifecycle management. Cloud Storage also offers various storage classes, such as Standard, Nearline, Coldline, and Archive, to meet different performance and cost requirements4

Low latency: Cloud Storage can provide low latency and high bandwidth for the data, as it supports HTTP and HTTPS protocols, and integrates with other Google Cloud services, such as AI Platform, Dataflow, and BigQuery. Cloud Storage also supports resumable uploads and downloads, and parallel composite uploads, which can improve the data transfer speed and reliability5

Easy access: Cloud Storage can provide easy access and management for the data, as it supports various tools and libraries, such as gsutil, Cloud Console, and Cloud Storage Client Libraries. Cloud Storage also supports fine-grained access control and encryption, which can ensure the data security and privacy.

The other options are not as effective or feasible. Loading the data into BigQuery and reading the data from BigQuery is not recommended, as BigQuery is mainly designed for analytical queries on large-scale data, and does not support streaming or real-time data processing. Loading the data into Cloud Bigtable and reading the data from Bigtable is not ideal, as Cloud Bigtable is mainly designed for low-latency and high-throughput key-value operations on sparse and wide tables, and does not support complex data types or schemas. Converting the CSV files into shards of TFRecords and storing the data in the Hadoop Distributed File System (HDFS) is not optimal, as HDFS is not natively supported by TensorFlow, and requires additional configuration and dependencies, such as Hadoop, Spark, or Beam.

References: 1: TFRecord and tf.Example 2: Better performance with the tf.data API 3: TensorFlow Data Validation 4: Cloud Storage overview 5: Performance : [How-to guides]

Question # 51

You are building a MLOps platform to automate your company's ML experiments and model retraining. You need to organize the artifacts for dozens of pipelines How should you store the pipelines' artifacts'?

Store parameters in Cloud SQL and store the models' source code and binaries in GitHub

Store parameters in Cloud SQL store the models' source code in GitHub, and store the models' binaries in Cloud Storage.

Store parameters in Vertex ML Metadata store the models' source code in GitHub and store the models' binaries in Cloud Storage.

Store parameters in Vertex ML Metadata and store the models source code and binaries in GitHub.

Question # 52

You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production model is required to keep up with market changes. Since being deployed to production, the model hasn't changed; however the accuracy of the model has steadily deteriorated. What issue is most likely causing the steady decline in model accuracy?

Poor data quality

Lack of model retraining

Too few layers in the model for capturing information

Incorrect data split ratio during model training, evaluation, validation, and test

Explanation:

Model retraining is the process of updating an existing machine learning model with new data and parameters to improve its performance and accuracy. Model retraining is essential for maintaining the relevance and validity of the model, especially when the data or the environment changes over time. Model retraining can help to avoid or reduce the effects of model degradation, which is the phenomenon of the model’s predictive performance decreasing as it is tested on new datasets within rapidly evolving environments1.

For the use case of predicting sales numbers, model accuracy is crucial, because the production model is required to keep up with market changes. Market changes can affect the demand, supply, price, and preference of the products, and thus influence the sales numbers. If the model is not retrained with new data that reflects the market changes, it may become outdated and inaccurate, and fail to capture the patterns and trends of the sales numbers. Therefore, the most likely issue that is causing the steady decline in model accuracy is the lack of model retraining.

The other options are not as likely as option B, because they are not directly related to the model’s ability to adapt to market changes. Option A, poor data quality, may affect the model’s accuracy, but it is not a specific cause of model degradation over time. Option C, too few layers in the model for capturing information, may affect the model’s complexity and expressiveness, but it is not a specific cause of model degradation over time. Option D, incorrect data split ratio during model training, evaluation, validation, and test, may affect the model’s generalization and validation, but it is not a specific cause of model degradation over time. Therefore, option B, lack of model retraining, is the best answer for this question.

References:

Beware Steep Decline: Understanding Model Degradation In Machine Learning Models

Question # 53

You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers. What factors should you consider before building the model?

Redaction, reproducibility, and explainability

Traceability, reproducibility, and explainability

Federated learning, reproducibility, and explainability

Differential privacy federated learning, and explainability

Explanation:

Before building an insurance approval model, an ML engineer should consider the factors of traceability, reproducibility, and explainability, as these are important aspects of responsible AI and fairness in a regulated domain. Traceability is the ability to track the provenance and lineage of the data, models, and decisions throughout the ML lifecycle. It helps to ensure the quality, reliability, and accountability of the ML system, and to comply with the regulatory and ethical standards. Reproducibility is the ability to recreate the same results and outcomes using the same data, models, and parameters. It helps to verify the validity, consistency, and robustness of the ML system, and to debug and improve the performance. Explainability is the ability to understand and interpret the logic, behavior, and outcomes of the ML system. It helps to increase the transparency, trust, and confidence of the ML system, and to identify and mitigate any potential biases, errors, or risks. The other options are not as relevant or comprehensive as this option. Redaction is the process of removing sensitive or confidential information from the data or documents, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the data preparation and protection. Federated learning is a technique that allows training ML models on decentralized data without transferring the data to a central server, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the model architecture and privacy preservation. Differential privacy is a method that adds noise to the data or the model outputs to protect the individual privacy of the data subjects, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the model evaluation and deployment. References:

Responsible AI documentation

Traceability documentation

Reproducibility documentation

Explainability documentation

Question # 54

You are investigating the root cause of a misclassification error made by one of your models. You used Vertex Al Pipelines to tram and deploy the model. The pipeline reads data from BigQuery. creates a copy of the data in Cloud Storage in TFRecord format trains the model in Vertex Al Training on that copy, and deploys the model to a Vertex Al endpoint. You have identified the specific version of that model that misclassified: and you need to recover the data this model was trained on. How should you find that copy of the data'?

Use Vertex Al Feature Store Modify the pipeline to use the feature store; and ensure that all training data is stored in it Search the feature store for the data used for the training.

Use the lineage feature of Vertex Al Metadata to find the model artifact Determine the version of the model and identify the step that creates the data copy, and search in the metadata for its location.

Use the logging features in the Vertex Al endpoint to determine the timestamp of the models deployment Find the pipeline run at that timestamp Identify the step that creates the data copy; and search in the logs for its location.

Find the job ID in Vertex Al Training corresponding to the training for the model Search in the logs of that job for the data used for the training.

Explanation:

Option A is not the best answer because it requires modifying the pipeline to use the Vertex AI Feature Store, which may not be feasible or necessary for recovering the data that the model was trained on. The Vertex AI Feature Store is a service that helps you manage, store, and serve feature values for your machine learning models1, but it is not designed for storing the raw data or the TFRecord files.

Option B is the best answer because it leverages the lineage feature of Vertex AI Metadata, which is a service that helps you track and manage the metadata of your machine learning workflows, such as datasets, models, metrics, and parameters2. The lineage feature allows you to view the relationships and dependencies among the artifacts and executions in your pipeline, and trace back the origin and history of any artifact3. By using the lineage feature, you can find the model artifact, determine the version of the model, identify the step that creates the data copy, and search in the metadata for its location.

Option C is not the best answer because it relies on the logging features in the Vertex AI endpoint, which may not be accurate or reliable for finding the data copy. The logging features in the Vertex AI endpoint help you monitor and troubleshoot the online predictions made by your deployed models, but they do not provide information about the training data or the pipeline steps4. Moreover, the timestamp of the model deployment may not match the timestamp of the pipeline run, as there may be delays or errors in the deployment process.

Option D is not the best answer because it requires finding the job ID in Vertex AI Training, which may not be easy or straightforward. Vertex AI Training is a service that helps you train your custom models on Google Cloud, but it does not provide a direct way to link the training job to the model version or the pipeline run. Moreover, searching in the logs of the job may not reveal the location of the data copy, as the logs may only contain information about the training process and the metrics.

References:

1: Introduction to Vertex AI Feature Store | Vertex AI | Google Cloud

2: Introduction to Vertex AI Metadata | Vertex AI | Google Cloud

3: View lineage for ML workflows | Vertex AI | Google Cloud

4: Monitor online predictions | Vertex AI | Google Cloud

[5]: Train custom models | Vertex AI | Google Cloud

Question # 55

You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same dashboard. What should you do?

Create multiple models using AutoML Tables

Automate multiple training runs using Cloud Composer

Run multiple training jobs on Al Platform with similar job names

Create an experiment in Kubeflow Pipelines to organize multiple runs

Question # 56

You work for an international manufacturing organization that ships scientific products all over the world Instruction manuals for these products need to be translated to 15 different languages Your organization's leadership team wants to start using machine learning to reduce the cost of manual human translations and increase translation speed. You need to implement a scalable solution that maximizes accuracy and minimizes operational overhead. You also want to include a process to evaluate and fix incorrect translations. What should you do?

Create a workflow using Cloud Function Triggers Configure a Cloud Function that is triggered when documents are uploaded to an input Cloud Storage bucket Configure another Cloud Function that translates the documents using the Cloud Translation API and saves the translations to an output Cloud Storage bucket Use human reviewers to evaluate the incorrect translations.

Create a Vertex Al pipeline that processes the documents1 launches an AutoML Translation training job evaluates the translations, and deploys the model to a Vertex Al endpoint with autoscaling and model monitoring When there is a predetermined skew between training and live data re-trigger the pipeline with the latest data.

Use AutoML Translation to tram a model Configure a Translation Hub project and use the trained model to translate the documents Use human reviewers to evaluate the incorrect translations

Use Vertex Al custom training jobs to fine-tune a state-of-the-art open source pretrained model with your data Deploy the model to a Vertex Al endpoint with autoscaling and model monitoring When there is a predetermined skew between the training and live data, configure a trigger to run another training job with the latest data.

Question # 57

You work for a retail company. You have been asked to develop a model to predict whether a customer will purchase a product on a given day. Your team has processed the company's sales data, and created a table with the following rows:

• Customer_id

• Product_id

• Date

• Days_since_last_purchase (measured in days)

• Average_purchase_frequency (measured in 1/days)

• Purchase (binary class, if customer purchased product on the Date)

You need to interpret your models results for each individual prediction. What should you do?

Create a BigQuery table Use BigQuery ML to build a boosted tree classifier Inspect the partition rules of the trees to understand how each prediction flows through the trees.

Create a Vertex Al tabular dataset Train an AutoML model to predict customer purchases Deploy the model

to a Vertex Al endpoint and enable feature attributions Use the "explain" method to get feature attribution values for each individual prediction.

Create a BigQuery table Use BigQuery ML to build a logistic regression classification model Use the values of the coefficients of the model to interpret the feature importance with higher values corresponding to more importance.

Create a Vertex Al tabular dataset Train an AutoML model to predict customer purchases Deploy the model to a Vertex Al endpoint. At each prediction enable L1 regularization to detect non-informative features.

Question # 58

You are an ML engineer in the contact center of a large enterprise. You need to build a sentiment analysis tool that predicts customer sentiment from recorded phone conversations. You need to identify the best approach to building a model while ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. What should you do?

Extract sentiment directly from the voice recordings

Convert the speech to text and build a model based on the words

Convert the speech to text and extract sentiments based on the sentences

Convert the speech to text and extract sentiment using syntactical analysis

Explanation:

Sentiment analysis is the process of identifying and extracting the emotions, opinions, and attitudes expressed in a text or speech. Sentiment analysis can help businesses understand their customers’ feedback, satisfaction, and preferences. There are different approaches to building a sentiment analysis tool, depending on the input data and the output format. Some of the common approaches are:

Extracting sentiment directly from the voice recordings: This approach involves using acoustic features, such as pitch, intensity, and prosody, to infer the sentiment of the speaker. This approach can capture the nuances and subtleties of the vocal expression, but it also requires a large and diverse dataset of labeled voice recordings, which may not be easily available or accessible. Moreover, this approach may not account for the semantic and contextual information of the speech, which can also affect the sentiment.

Converting the speech to text and building a model based on the words: This approach involves using automatic speech recognition (ASR) to transcribe the voice recordings into text, and then using lexical features, such as word frequency, polarity, and valence, to infer the sentiment of the text. This approach can leverage the existing text-based sentiment analysis models and tools, but it also introduces some challenges, such as the accuracy and reliability of the ASR system, the ambiguity and variability of the natural language, and the loss of the acoustic information of the speech.

Converting the speech to text and extracting sentiments based on the sentences: This approach involves using ASR to transcribe the voice recordings into text, and then using syntactic and semantic features, such as sentence structure, word order, and meaning, to infer the sentiment of the text. This approach can capture the higher-level and complex aspects of the natural language, such as negation, sarcasm, and irony, which can affect the sentiment. However, this approach also requires more sophisticated and advanced natural language processing techniques, such as parsing, dependency analysis, and semantic role labeling, which may not be readily available or easy to implement.

Converting the speech to text and extracting sentiment using syntactical analysis: This approach involves using ASR to transcribe the voice recordings into text, and then using syntactical analysis, such as part-of-speech tagging, phrase chunking, and constituency parsing, to infer the sentiment of the text. This approach can identify the grammatical and structural elements of the natural language, such as nouns, verbs, adjectives, and clauses, which can indicate the sentiment. However, this approach may not account for the pragmatic and contextual information of the speech, such as the speaker’s intention, tone, and situation, which can also influence the sentiment.

For the use case of building a sentiment analysis tool that predicts customer sentiment from recorded phone conversations, the best approach is to convert the speech to text and extract sentiments based on the sentences. This approach can balance the trade-offs between the accuracy, complexity, and feasibility of the sentiment analysis tool, while ensuring that the gender, age, and cultural differences of the customers who called the contact center do not impact any stage of the model development pipeline and results. This approach can also handle different types and levels of sentiment, such as polarity (positive, negative, or neutral), intensity (strong or weak), and emotion (anger, joy, sadness, etc.). Therefore, converting the speech to text and extracting sentiments based on the sentences is the best approach for this use case.

Question # 59

You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What should you do?

Use batch prediction mode instead of online mode.

Send the request again with a smaller batch of instances.

Use base64 to encode your data before using it for prediction.

Apply for a quota increase for the number of prediction requests.

Question # 60

You work for a manufacturing company. You need to train a custom image classification model to detect product detects at the end of an assembly line. Although your model is performing well, some images in your holdout set are consistently mislabeled with high confidence. You want to use Vertex Al to understand your models results. What should you do?

Configure feature-based explanations by using sampled Shapley. Set number of feature permutations to the maximum value of 50.

Create an index by using Vertex Al Matching Engine. Query the index with your mislabeled images

Configure example-based explanations by using integrated gradients. Set visualization type to pixels, and set clip_percent_upperbound to 95.

Configure example-based explanations. Specify the embedding output layer to be used for the latent space representation.

Question # 61

You work for a bank and are building a random forest model for fraud detection. You have a dataset that

includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

Write your data in TFRecords.

Z-normalize all the numeric features.

Oversample the fraudulent transaction 10 times.

Use one-hot encoding on all categorical features.

Question # 62

Your company manages an ecommerce website. You developed an ML model that recommends additional products to users in near real time based on items currently in the user's cart. The workflow will include the following processes.

1 The website will send a Pub/Sub message with the relevant data and then receive a message with the prediction from Pub/Sub.

2 Predictions will be stored in BigQuery

3. The model will be stored in a Cloud Storage bucket and will be updated frequently

You want to minimize prediction latency and the effort required to update the model How should you reconfigure the architecture?

Write a Cloud Function that loads the model into memory for prediction Configure the function to be

triggered when messages are sent to Pub/Sub.

Create a pipeline in Vertex Al Pipelines that performs preprocessing, prediction and postprocessing

Configure the pipeline to be triggered by a Cloud Function when messages are sent to Pub/Sub.

Expose the model as a Vertex Al endpoint Write a custom DoFn in a Dataflow job that calls the endpoint for

prediction.

Use the Runlnference API with watchFilePatterr. in a Dataflow job that wraps around the model and serves predictions.

Question # 63

You work for a company that is developing an application to help users with meal planning You want to use machine learning to scan a corpus of recipes and extract each ingredient (e g carrot, rice pasta) and each kitchen cookware (e.g. bowl, pot spoon) mentioned Each recipe is saved in an unstructured text file What should you do?

Create a text dataset on Vertex Al for entity extraction Create two entities called ingredient" and cookware" and label at least 200 examples of each entity Train an AutoML entity extraction model to extract occurrences of these entity types Evaluate performance on a holdout dataset.

Create a multi-label text classification dataset on Vertex Al Create a test dataset and label each recipe that corresponds to its ingredients and cookware Train a multi-class classification model Evaluate the model’s performance on a holdout dataset.

Use the Entity Analysis method of the Natural Language API to extract the ingredients and cookware from each recipe Evaluate the model's performance on a prelabeled dataset.

Create a text dataset on Vertex Al for entity extraction Create as many entities as there are different ingredients and cookware Train an AutoML entity extraction model to extract those entities Evaluate the models performance on a holdout dataset.

Question # 64

You work for an online publisher that delivers news articles to over 50 million readers. You have built an AI model that recommends content for the company’s weekly newsletter. A recommendation is considered successful if the article is opened within two days of the newsletter’s published date and the user remains on the page for at least one minute.

All the information needed to compute the success metric is available in BigQuery and is updated hourly. The model is trained on eight weeks of data, on average its performance degrades below the acceptable baseline after five weeks, and training time is 12 hours. You want to ensure that the model’s performance is above the acceptable baseline while minimizing cost. How should you monitor the model to determine when retraining is necessary?

Use Vertex AI Model Monitoring to detect skew of the input features with a sample rate of 100% and a monitoring frequency of two days.

Schedule a cron job in Cloud Tasks to retrain the model every week before the newsletter is created.

Schedule a weekly query in BigQuery to compute the success metric.

Schedule a daily Dataflow job in Cloud Composer to compute the success metric.

Explanation:

The best option for monitoring the model to determine when retraining is necessary is to schedule a weekly query in BigQuery to compute the success metric. This option has the following advantages:

It allows the model performance to be evaluated regularly, based on the actual outcome of the recommendations. By computing the success metric, which is the percentage of articles that are opened within two days and read for at least one minute, you can measure how well the model is achieving its objective and compare it with the acceptable baseline.

It leverages the scalability and efficiency of BigQuery, which is a serverless, fully managed, and highly scalable data warehouse that can run complex queries over petabytes of data in seconds. By using BigQuery, you can access and analyze all the information needed to compute the success metric, such as the newsletter publication date, the article opening date, and the user reading time, without worrying about the infrastructure or the cost.

It simplifies the model monitoring and retraining workflow, as the weekly query can be scheduled and executed automatically using BigQuery’s built-in scheduling feature. You can also set up alerts or notifications to inform you when the success metric falls below the acceptable baseline, and trigger the model retraining process accordingly.

The other options are less optimal for the following reasons:

Option A: Using Vertex AI Model Monitoring to detect skew of the input features with a sample rate of 100% and a monitoring frequency of two days introduces additional complexity and overhead. This option requires setting up and managing a Vertex AI Model Monitoring service, which is a managed service that provides various tools and features for machine learning, such as training, tuning, serving, and monitoring. However, using Vertex AI Model Monitoring to detect skew of the input features may not reflect the actual performance of the model, as skew is the discrepancy between the distributions of the features in the training dataset and the serving data, which may not affect the outcome of the recommendations. Moreover, using a sample rate of 100% and a monitoring frequency of two days may incur unnecessary cost and latency, as it requires analyzing all the input features every two days, which may not be needed for the model monitoring.

Option B: Scheduling a cron job in Cloud Tasks to retrain the model every week before the newsletter is created introduces additional cost and risk. This option requires creating and running a cron job in Cloud Tasks, which is a fully managed service that allows you to schedule and execute tasks that are invoked by HTTP requests. However, using Cloud Tasks to retrain the model every week may not be optimal, as it may retrain the model more often than necessary, wasting compute resources and cost. Moreover, using Cloud Tasks to retrain the model before the newsletter is created may introduce risk, as it may deploy a new model version that has not been tested or validated, potentially affecting the quality of the recommendations.

Option D: Scheduling a daily Dataflow job in Cloud Composer to compute the success metric introduces additional complexity and cost. This option requires creating and running a Dataflow job in Cloud Composer, which is a fully managed service that runs Apache Airflow pipelines for workflow orchestration. Dataflow is a fully managed service that runs Apache Beam pipelines for data processing and transformation. However, using Dataflow and Cloud Composer to compute the success metric may not be necessary, as it may add more steps and overhead to the model monitoring process. Moreover, using Dataflow and Cloud Composer to compute the success metric daily may not be optimal, as it may compute the success metric more often than needed, consuming more compute resources and cost.

References:

[BigQuery documentation]

[Vertex AI Model Monitoring documentation]

[Cloud Tasks documentation]

[Cloud Composer documentation]

[Dataflow documentation]

Question # 65

You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product sales volumes, expenses, and profits for all regions. What should you use as the input and output for your model?

Use latitude, longitude, and product type as features. Use profit as model output.

Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs.

Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use profit as model output.

Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model outputs.

Explanation:

Option A is incorrect because using latitude, longitude, and product type as features, and using profit as model output is not the best way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option does not capture the interaction between latitude and longitude, which may affect the profitability of the product. For example, the same product may have different profitability in different regions, depending on the climate, culture, or preferences of the customers. Moreover, this option does not account for the granularity of the location data, which may be too fine or too coarse for the model. For example, using the exact coordinates of a city may not be meaningful, as the profitability may vary within the city, or using the country name may not be informative, as the profitability may vary across the country.

Option B is incorrect because using latitude, longitude, and product type as features, and using revenue and expenses as model outputs is not a suitable way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option has the same drawbacks as option A, as it does not capture the interaction between latitude and longitude, or account for the granularity of the location data. Moreover, this option does not directly predict the profitability of the product, which is the target variable of interest. Instead, it predicts the revenue and expenses of the product, which are intermediate variables that depend on other factors, such as the price, the cost, or the demand of the product. To obtain the profitability, we would need to subtract the expenses from the revenue, which may introduce errors or uncertainties in the prediction.

Option C is correct because using product type and the feature cross of latitude with longitude, followed by binning, as features, and using profit as model output is a good way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option captures the interaction between latitude and longitude, which may affect the profitability of the product, by creating a feature cross of these two features. A feature cross is a synthetic feature that combines the values of two or more features into a single feature1. This option also accounts for the granularity of the location data, by binning the feature cross into discrete buckets. Binning is a technique that groups continuous values into intervals, which can reduce the noise and complexity of the data2. Moreover, this option directly predicts the profitability of the product, which is the target variable of interest, by using it as the model output.

Option D is incorrect because using product type and the feature cross of latitude with longitude, followed by binning, as features, and using revenue and expenses as model outputs is not a valid way to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. This option has the same advantages as option C, as it captures the interaction between latitude and longitude, and accounts for the granularity of the location data, by creating a feature cross and binning it. However, this option does not directly predict the profitability of the product, which is the target variable of interest, but rather predicts the revenue and expenses of the product, which are intermediate variables that depend on other factors, as explained in option B.

References:

Feature cross

Binning

[Profitability]

[Revenue and expenses]

[Latitude and longitude]

[Product type]

Google Professional-Machine-Learning-Engineer Premium Access Download Demo

Page: 3 / 5
Total 285 questions

Weekend Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmaspas7

Professional-Machine-Learning-Engineer Google Professional Machine Learning Engineer Free Practice Exam Questions (2025 Updated)

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: