Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmaspas7

Easiest Solution 2 Pass Your Certification Exams

Professional-Cloud-DevOps-Engineer Google Cloud Certified - Professional Cloud DevOps Engineer Exam Free Practice Exam Questions (2026 Updated)

Prepare effectively for your Google Professional-Cloud-DevOps-Engineer Google Cloud Certified - Professional Cloud DevOps Engineer Exam certification with our extensive collection of free, high-quality practice questions. Each question is designed to mirror the actual exam format and objectives, complete with comprehensive answers and detailed explanations. Our materials are regularly updated for 2026, ensuring you have the most current resources to build confidence and succeed on your first attempt.

You are the on-call Site Reliability Engineer for a microservice that is deployed to a Google Kubernetes Engine (GKE) Autopilot cluster. Your company runs an online store that publishes order messages to Pub/Sub and a microservice receives these messages and updates stock information in the warehousing system. A sales event caused an increase in orders, and the stock information is not being updated quickly enough. This is causing a large number of orders to be accepted for products that are out of stock You check the metrics for the microservice and compare them to typical levels.

You need to ensure that the warehouse system accurately reflects product inventory at the time orders are placed and minimize the impact on customers What should you do?

A.

Decrease the acknowledgment deadline on the subscription

B.

Add a virtual queue to the online store that allows typical traffic levels

C.

Increase the number of Pod replicas

D.

Increase the Pod CPU and memory limits

You support a high-traffic web application with a microservice architecture. The home page of the application displays multiple widgets containing content such as the current weather, stock prices, and news headlines. The main serving thread makes a call to a dedicated microservice for each widget and then lays out the homepage for the user. The microservices occasionally fail; when that happens, theserving thread serves the homepage with some missing content. Users of the application are unhappy if this degraded mode occurs too frequently, but they would rather have some content served instead of no content at all. You want to set a Service Level Objective (SLO) to ensure that the user experience does not degrade too much. What Service Level Indicator {SLI) should you use to measure this?

A.

A quality SLI: the ratio of non-degraded responses to total responses

B.

An availability SLI: the ratio of healthy microservices to the total number of microservices

C.

A freshness SLI: the proportion of widgets that have been updated within the last 10 minutes

D.

A latency SLI: the ratio of microservice calls that complete in under 100 ms to the total number of microservice calls

You are performing a semiannual capacity planning exercise for your flagship service. You expect a service user growth rate of 10% month-over-month over the next six months. Your service is fully containerized and runs on Google Cloud Platform (GCP). using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler enabled. You currently consume about 30% of your total deployed CPU capacity, and you require resilience against the failure of a zone. You want to ensure that your users experience minimal negative impact as a result of this growth or as a result of zone failure, while avoiding unnecessary costs. How should you prepare to handle the predicted growth?

A.

Verity the maximum node pool size, enable a horizontal pod autoscaler, and then perform a load test to verity your expected resource needs.

B.

Because you are deployed on GKE and are using a cluster autoscaler. your GKE cluster will scale automatically, regardless of growth rate.

C.

Because you are at only 30% utilization, you have significant headroom and you won't need to add any additional capacity for this rate of growth.

D.

Proactively add 60% more node capacity to account for six months of 10% growth rate, and then perform a load test to make sure you have enough capacity.

Your team has recently deployed an NGINX-based application into Google Kubernetes Engine (GKE) and has exposed it to the public via an HTTP Google Cloud Load Balancer (GCLB) ingress. You want to scale the deployment of the application's frontend using an appropriate Service Level Indicator (SLI). What should you do?

A.

Configure the horizontal pod autoscaler to use the average response time from the Liveness and Readiness probes.

B.

Configure the vertical pod autoscaler in GKE and enable the cluster autoscaler to scale the cluster as pods expand.

C.

Install the Stackdriver custom metrics adapter and configure a horizontal pod autoscaler to use the number of requests provided by the GCLB.

D.

Expose the NGINX stats endpoint and configure the horizontal pod autoscaler to use the request metrics exposed by the NGINX deployment.

You are responsible for the reliability of a high-volume enterprise application. A large number of users report that an important subset of the application’s functionality – a data intensive reporting feature – is consistently failing with an HTTP 500 error. When you investigate your application’s dashboards, you notice a strong correlation between the failures and a metric that represents the size of an internal queue used for generating reports. You trace the failures to a reporting backend that is experiencing high I/O wait times. You quickly fix the issue by resizing the backend’s persistent disk (PD). How you need to create an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?

A.

As the I/O wait times aggregated across all report generation backends

B.

As the proportion of report generation requests that result in a successful response

C.

As the application’s report generation queue size compared to a known-good threshold

D.

As the reporting backend PD throughout capacity compared to a known-good threshold

Your product is currently deployed in three Google Cloud Platform (GCP) zones with your users divided between the zones. You can fail over from one zone to another, but it causes a 10-minute service disruption for the affected users. You typically experience a database failure once per quarter and can detect it within five minutes. You are cataloging the reliability risks of a new real-time chat feature for your product. You catalog the following information for each risk:

• Mean Time to Detect (MUD} in minutes

• Mean Time to Repair (MTTR) in minutes

• Mean Time Between Failure (MTBF) in days

• User Impact Percentage

The chat feature requires a new database system that takes twice as long to successfully fail over between zones. You want to account for the risk of the new database failing in one zone. What would be the values for the risk of database failover with the new system?

A.

MTTD: 5MTTR: 10MTBF: 90Impact: 33%

B.

MTTD:5MTTR: 20MTBF: 90Impact: 33%

C.

MTTD:5MTTR: 10MTBF: 90Impact 50%

D.

MTTD:5MTTR: 20MTBF: 90Impact: 50%

You need to enforce several constraint templates across your Google Kubernetes Engine (GKE) clusters. The constraints include policy parameters, such as restricting the Kubernetes API. You must ensure that the policy parameters are stored in a GitHub repository and automatically applied when changes occur. What should you do?

A.

Set up a GitHub action to trigger Cloud Build when there is a parameter change. In Cloud Build, run a gcloud CLI command to apply the change.

B.

When there is a change in GitHub, use a web hook to send a request to Anthos Service Mesh, and apply the change.

C.

Configure Anthos Config Management with the GitHub repository. When there is a change in the repository, use Anthos Config Management to apply the change.

D.

Configure Config Connector with the GitHub repository. When there is a change in the repository, use Config Connector to apply the change.

Your company allows teams to self-manage Google Cloud projects, including project-level Identity and Access Management (IAM). You are concerned that the team responsible for the Shared VPC project might accidentally delete the project, so a lien has been placed on the project. You need to design a solution to restrict Shared VPC project deletion to those with the resourcemanager.projects.updateLiens permission at the organization level. What should you do?

A.

Enable VPC Service Controls for the container.googleapis.com API service.

B.

Revoke the resourcemanager.projects.updateLiens permission from all users associated with the project.

C.

Enable the compute.restrictXpnProjectLienRemoval organization policy constraint.

D.

Instruct teams to only perform IAM permission management as code with Terraform.

You are developing a strategy for monitoring your Google Cloud Platform (GCP) projects in production using Stackdriver Workspaces. One of the requirements is to be able to quickly identify and react to production environment issues without false alerts from development and staging projects. You want to ensure that you adhere to the principle of least privilege when providing relevant team members with access to Stackdriver Workspaces. What should you do?

A.

Grant relevant team members read access to all GCP production projects. Create Stackdriver workspaces inside each project.

B.

Grant relevant team members the Project Viewer IAM role on all GCP production projects. Create Slackdriver workspaces inside each project.

C.

Choose an existing GCP production project to host the monitoring workspace. Attach the production projects to this workspace. Grant relevant team members read access to the Stackdriver Workspace.

D.

Create a new GCP monitoring project, and create a Stackdriver Workspace inside it. Attach the production projects to this workspace. Grant relevant team members read access to the Stackdriver Workspace.

You manage a retail website for your company. The website consists of several microservices running in a GKE Standard node pool with node autoscaling enabled. Each microservice has resource limits and a Horizontal Pod Autoscaler configured. During a busy period, you receive alerts for one of the microservices. When you check the Pods, half of them have the status OOMKilled, and the number of Pods is at the minimum autoscaling limit. You need to resolve the issue. What should you do?

A.

Increase the memory resource limit of the microservice.

B.

Increase the maximum number of nodes in the node pool.

C.

Increase the maximum replica limit of the Horizontal Pod Autoscaler.

D.

Update the node pool to use a machine type with more memory.

You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OLJ and Communications Lead (CL). What should you do next?

A.

Look for ways to mitigate user impact and deploy the mitigations to production.

B.

Contact the affected service owners and update them on the status of the incident.

C.

Establish a communication channel where incident responders and leads can communicate with each other.

D.

Start a postmortem, add incident information, circulate the draft internally, and ask internal stakeholders for input.

You have deployed a fleet Of Compute Engine instances in Google Cloud. You need to ensure that monitoring metrics and logs for the instances are visible in Cloud Logging and Cloud Monitoring by your company's operations and cyber

security teams. You need to grant the required roles for the Compute Engine service account by using Identity and Access Management (IAM) while following the principle of least privilege. What should you do?

A.

Grant the logging.editor and monitoring.metricwriter roles to the Compute Engine service accounts.

B.

Grant the Logging. admin and monitoring . editor roles to the Compute Engine service accounts.

C.

Grant the logging. logwriter and monitoring. editor roles to the Compute Engine service accounts.

D.

Grant the logging. logWriter and monitoring. metricWriter roles to the Compute Engine service accounts.

You have an application deployed to Cloud Run. A new version of the application has recently been deployed using the canary deployment strategy. Your Site Reliability Engineering (SRE) teammate informs you that an SLO has been exceeded for this application. You need to make the application healthy as quickly as possible. What should you do first?

A.

Configure traffic splitting to send 100% of the traffic to the latest revision.

B.

Configure traffic splitting to send 100% of the traffic to the previous revision.

C.

Create a new revision using the last known good version of the application.

D.

Identify the cause of the latency by using Cloud Trace.

Your organization wants to implement Site Reliability Engineering (SRE) culture and principles. Recently, a service that you support had a limited outage. A manager on another team asks you to provide a formal explanation of what happened so they can action remediations. What should you do?

A.

Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it with the manager only.

B.

Develop a postmortem that includes the root causes, resolution, lessons learned, and a prioritized list of action items. Share it on the engineering organization's document portal.

C.

Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it with the manager only.

D.

Develop a postmortem that includes the root causes, resolution, lessons learned, the list of people responsible, and a list of action items for each person. Share it on the engineering organization's document portal.

You are running a web application deployed to a Compute Engine managed instance group Ops Agent is installed on all instances You recently noticed suspicious activity from a specific IP address You need to configure Cloud Monitoring to view the number of requests from that specific IP address with minimal operational overhead. What should you do?

A.

Configure the Ops Agent with a logging receiver Create a logs-based metric

B.

Create a script to scrape the web server log Export the IP address request metrics to the Cloud Monitoring API

C.

Update the application to export the IP address request metrics to the Cloud Monitoring API

D.

Configure the Ops Agent with a metrics receiver

You support an application running on App Engine. The application is used globally and accessed from various device types. You want to know the number of connections. You are using Stackdriver Monitoring for App Engine. What metric should you use?

A.

flex/connections/current

B.

tcp_ssl_proxy/new_connections

C.

tcp_ssl_proxy/open_connections

D.

flex/instance/connections/current

Copyright © 2014-2026 Solution2Pass. All Rights Reserved