Spring Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmaspas7

Easiest Solution 2 Pass Your Certification Exams

NCP-AII NVIDIA AI Infrastructure Free Practice Exam Questions (2026 Updated)

Prepare effectively for your NVIDIA NCP-AII NVIDIA AI Infrastructure certification with our extensive collection of free, high-quality practice questions. Each question is designed to mirror the actual exam format and objectives, complete with comprehensive answers and detailed explanations. Our materials are regularly updated for 2026, ensuring you have the most current resources to build confidence and succeed on your first attempt.

Page: 1 / 2
Total 71 questions

After configuring NGC CLI with ngc config set, a user receives ”Authentication failed” errors when pulling containers. What step was most likely omitted?

A.

Installing the CLI with apt-get instead of manual extraction.

B.

Entering the API key during ngc config set or storing it in ~/.ngc/config.

C.

Setting --format_type=json to enable API interactions.

D.

Running sudo systemctl restart docker after configuration.

A cluster administrator needs to validate transceiver firmware versions across 200 ports using UFM. Which GUI-based method provides a consolidated view?

A.

Navigate to ’Devices" > select a switch > "Cables' tab to see ASIC firmware and transceiver versions.

B.

Use "Topology’ view to visually inspect cable icons.

C.

Run mlxlink -d lid- -m on each port manually.

D.

Export all switch logs and grep for ’FW Version".

You are a network administrator responsible for configuring an East-West (E/W) Spectrum-X fabric using SuperNIC. The Bluefield-3 devices in your network should be set to NIC mode with RoCE enabled to optimize data flow between servers. You have access to the Spectrum-X management tools and the necessary documentation. You need to use specific configuration commands to achieve this setup. Which of the following steps and commands are necessary to configure the Bluefield-3 devices in NIC mode for the E/W Spectrum-X fabric using SuperNIC? (Pick the 2 correct responses below)

A.

Use the command sudo mlxconfig -d /dev/mst/ set LINK_TYPE_P1=2 to enable Ethernet on the Bluefield-3 devices.

B.

Use the command sudo mlxconfig -d /dev/mst/ set DISABLE_SPECTRUM_X=1 to reduce overhead.

C.

Use the command sudo mlxconfig -d /dev/mst/ set INTERNAL_CPU_OFFLOAD_ENGINE=1 to configure the SuperNIC to operate in NIC mode.

D.

Use the command sudo mlxconfig -d /dev/mst/ set DPU_MODE=1 to set up the Bluefield-3 devices in DPU mode.

ClusterKit's NCCL bandwidth test shows 350 GB/s on a 400G InfiniBand fabric. How should this result be interpreted?

A.

Optimal performance, indicating healthy fabric and GPUDirect RDMA.

B.

Suboptimal performance; requires FEC tuning to reach 380+ GB/s.

C.

Critical failure; expected is >390 GB/s for HDR InfiniBand.

D.

Inconclusive; rerun with --stress=cpu to validate.

An administrator needs to verify HA functionality after configuring BCM (Bright Cluster Manager). Which command confirms the active head node and failover readiness?

A.

cmsh status to check HA status and active/standby roles.

B.

nvsm show health to validate GPU status on both head nodes.

C.

systemctl restart cmdaemon to force a failover test.

D.

ping to test basic connectivity.

During East-West fabric validation on a 64-GPU cluster, an engineer runs all_reduce_perf and observes an algorithm bandwidth of 350 GB/s and bus bandwidth of 656 GB/s. What does this indicate about the fabric performance?

A.

Inconclusive; rerun with point-to-point tests.

B.

Optimal performance; bus bandwidth near theoretical peak for NDR InfiniBand.

C.

Critical failure; bus bandwidth exceeds hardware capabilities.

D.

Suboptimal performance; algorithm bandwidth should match bus bandwidth.

An engineer needs to validate 400G DAC cable signal integrity in a DGX cluster. Which CVT metric best identifies marginal cables needing replacement?

A.

Lane power variance < 3dB across all transceivers.

B.

Transceiver model matching QSFP-DD specifications.

C.

Temperature fluctuations > 5°C during validation.

D.

Effective BER > 1.5E-254 during a <6-hour monitoring window.

One of the nodes in a cluster is not running as fast as the others and the system administrator needs to check the status of the GPUs on that system. What command should be used?

A.

lspci | grep NVIDIA

B.

nvidia-smi

C.

nvidia-gpu-status

D.

iblinkinfo

An InfiniBand server stops working, and a system administrator runs the "ibstat" command that provides the following output:

CA 'mlx5_1'

CA type: MT4115

Number of ports: 2

Firmware version: 10.20.1010

Hardware version: 0

Node GUID: 0x0002c90300002f78

System image GUID: 0x0002c90300002f7b

Port 1:

State: Initializing

Physical state: Linkup

Rate: 100

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x0251086a

Port GUID: 0x0002c90300002f79

Link layer: InfiniBand

What is the cause of the issue?

A.

The HCA port is faulty.

B.

There is no running SM in the fabric.

C.

The neighboring switch port is faulty.

D.

The cable is disconnected.

During cluster deployment, the UFM Cable Validation Tool reports "Wrong-neighbor" errors on multiple InfiniBand links. What is the most efficient way to resolve this issue?

A.

Reboot all leaf switches to force LLDP rediscovery.

B.

Replace all affected cables with higher-grade OM5 fiber optics.

C.

Verify LLDP data against topology files and remediate.

D.

Disable FEC on all switches to bypass neighbor validation.

A leaf switch shows "FW Version Mismatch" alerts for transceivers after cluster expansion. Which tool validates transceiver firmware against expected versions?

A.

flint

B.

iblinkinfo

C.

mlxconfig

D.

ethtool

During HPL execution on a DGX cluster, the benchmark fails with "not enough memory" errors despite sufficient physical RAM. Which HPL.dat parameter adjustment is most effective?

A.

Reduce the problem size while maintaining the same block size.

B.

Set PMAP to 1 to enable process mapping.

C.

Increase block size to 6144 to maximize GPU utilization.

D.

Disable double-buffering via BCAST parameter.

After ClusterKit reports "GPU-Host latency exceeds threshold," which NVIDIA diagnostic tool should be used to isolate hardware faults?

A.

Re-run ClusterKit with --stress=gpu -Y 60 to extend test duration

B.

nvidia-smi topo -m to inspect GPU topology connections

C.

DCGM Diags dcgmi diag -r 2

D.

ib_write_bw to measure InfiniBand bandwidth between nodes

A system engineer needs to set the vGPU scheduling behavior for all GPUs to share the scheduling equally with the default time slice length. What command should be used?

A.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"

B.

esxcli graphics module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x01"

C.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=FRL=0x01"

D.

esxcli system module parameters set -m nvidia -p "NVreg_RegistryDwords=RmPVMRL=0x00"

A network engineer is tasked with configuring the management, storage, and compute networks for a new DGX BasePOD deployment. Which statement best describes the network segmentation required for optimal operation?

A.

A single VLAN for all types of network traffic.

B.

Two networks: one for management and one for compute.

C.

Four networks: compute, storage, out-of-band, and management.

A customer is designing an AI Factory for enterprise-scale deployments and wants to ensure redundancy and load balancing for the management and storage networks. Which feature should be implemented on the Ethernet switches?

A.

Implement redundant switches with spanning tree protocol.

B.

MLAG for bonded interfaces across redundant switches.

C.

Use only one switch for all management and storage traffic.

D.

Disable VLANs and use unmanaged switches.

A customer has just completed the first boot of their DGX system and is prompted to create an administrative user. What is the correct approach for setting up this user to ensure secure BMC and GRUB access?

A.

Create separate usernames for BMC and GRUB to maximize flexibility.

B.

Skip the creation of a new user and retain the default admin account for BMC and GRUB access.

C.

Create a unique, strong, lower-case username and password that will be used for both BMC and GRUB access, avoiding default or weak credentials.

D.

Use “sysadmin” as the username and a simple password for ease of management.

A team is installing the NVIDIA Run:ai control plane on a Kubernetes cluster. Which two (2) options are most critical to validate before proceeding? (Pick the 2 correct responses below)

A.

Helm is installed on the installer machine.

B.

Ensure Kubernetes is running on the cluster.

C.

All cluster nodes have NVIDIA GPUs installed.

D.

NTP is disabled to simplify time synchronization.

A team is validating a DGX BasePOD deployment. Using cmsh, they run a command to check GPU health across all nodes. What indicates that the system is ready for AI workloads?

A.

The command output is ignored if the system powers on without errors.

B.

At least half of the GPUs report Status_Health = OK.

C.

All GPUs report Status_Health = OK and Health = OK for each device.

D.

Only the head node's GPUs need to be healthy.

After configuring HA, the administrator runs cmsh status and notices the secondary head node reports mysql [FAIL]. What is the most likely cause?

A.

The BCM license expired after HA configuration.

B.

Network connectivity issues between the primary and secondary head nodes.

C.

The secondary head node lacks NVIDIA GPU drivers.

D.

The cluster nodes are powered on during the HA configuration.

Page: 1 / 2
Total 71 questions
Copyright © 2014-2026 Solution2Pass. All Rights Reserved