Summer Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: s2p65

Easiest Solution 2 Pass Your Certification Exams

CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) Free Practice Exam Questions (2025 Updated)

Prepare effectively for your Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) certification with our extensive collection of free, high-quality practice questions. Each question is designed to mirror the actual exam format and objectives, complete with comprehensive answers and detailed explanations. Our materials are regularly updated for 2025, ensuring you have the most current resources to build confidence and succeed on your first attempt.

Page: 1 / 1
Total 60 questions

You are working on a project where you need to chain together MapReduce, Pig jobs. You also need the ability to use forks, decision points, and path joins. Which ecosystem project should you use to perform these actions?

A.

Oozie

B.

ZooKeeper

C.

HBase

D.

Sqoop

E.

HUE

Your cluster has the following characteristics:

    A rack aware topology is configured and on

    Replication is set to 3

    Cluster block size is set to 64MB

Which describes the file read process when a client application connects into the cluster and requests a 50MB file?

A.

The client queries the NameNode for the locations of the block, and reads all three copies. The first copy to complete transfer to the client is the one the client reads as part of hadoop’s speculative execution framework.

B.

The client queries the NameNode for the locations of the block, and reads from the first location in the list it receives.

C.

The client queries the NameNode for the locations of the block, and reads from a random location in the list it receives to eliminate network I/O loads by balancing which nodes it retrieves data from any given time.

D.

The client queries the NameNode which retrieves the block from the nearest DataNode to the client then passes that block back to the client.

You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum Storage. What is the purpose of ZooKeeper in such a configuration?

A.

It only keeps track of which NameNode is Active at any given time

B.

It monitors an NFS mount point and reports if the mount point disappears

C.

It both keeps track of which NameNode is Active at any given time, and manages the Edits file. Which is a log of changes to the HDFS filesystem

D.

If only manages the Edits file, which is log of changes to the HDFS filesystem

E.

Clients connect to ZooKeeper to determine which NameNode is Active

Your Hadoop cluster contains nodes in three racks. You have not configured the dfs.hosts property in the NameNode’s configuration file. What results?

A.

The NameNode will update the dfs.hosts property to include machines running the DataNode daemon on the next NameNode reboot or with the command dfsadmin –refreshNodes

B.

No new nodes can be added to the cluster until you specify them in the dfs.hosts file

C.

Any machine running the DataNode daemon can immediately join the cluster

D.

Presented with a blank dfs.hosts property, the NameNode will permit DataNodes specified in mapred.hosts to join the cluster

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and processing many small files, you decide to do the following actions:

1. Group the individual images into a set of larger files

2. Use the set of larger files as input for a MapReduce job that processes them directly with python using Hadoop streaming.

Which data serialization system gives the flexibility to do this?

A.

CSV

B.

XML

C.

HTML

D.

Avro

E.

SequenceFiles

F.

JSON

You have A 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running HDFS High Availability (HA). You want to minimize the chance of data loss in your cluster. What should you do?

A.

Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum

B.

Set an HDFS replication factor that provides data redundancy, protecting against node failure

C.

Run a Secondary NameNode on a different master from the NameNode in order to provide automatic recovery from a NameNode failure.

D.

Run the ResourceManager on a different master from the NameNode in order to load-share HDFS metadata processing

E.

Configure the cluster’s disk drives with an appropriate fault tolerant RAID level

You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?

A.

Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine

B.

Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine

C.

Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster

D.

Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine

E.

Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node

A slave node in your cluster has 4 TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on all disks. You set the value of the dfs.datanode.du.reserved parameter to 100 GB. How does this alter HDFS block storage?

A.

25GB on each hard drive may not be used to store HDFS blocks

B.

100GB on each hard drive may not be used to store HDFS blocks

C.

All hard drives may be used to store HDFS blocks as long as at least 100 GB in total is available on the node

D.

A maximum if 100 GB on each hard drive may be used to store HDFS blocks

You’re upgrading a Hadoop cluster from HDFS and MapReduce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce a block size of 128MB for all new files written to the cluster after upgrade. What should you do?

A.

You cannot enforce this, since client code can always override this value

B.

Set dfs.block.size to 128M on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final

C.

Set dfs.block.size to 128 M on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode

D.

Set dfs.block.size to 134217728 on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final

E.

Set dfs.block.size to 134217728 on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode

You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

A.

When your workload generates a large amount of output data, significantly larger than the amount of intermediate data

B.

When your workload consumes a large amount of input data, relative to the entire capacity if HDFS

C.

When your workload consists of processor-intensive tasks

D.

When your workload generates a large amount of intermediate data, on the order of the input data itself

Identify two features/issues that YARN is designated to address: (Choose two)

A.

Standardize on a single MapReduce API

B.

Single point of failure in the NameNode

C.

Reduce complexity of the MapReduce APIs

D.

Resource pressure on the JobTracker

E.

Ability to run framework other than MapReduce, such as MPI

F.

HDFS latency

You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently see that MapReduce map tasks on your cluster are running slowly because of excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB to optimize performance?

A.

yarn.application.child.java.opts=-Xsx3072m

B.

yarn.application.child.java.opts=-Xmx3072m

C.

mapreduce.map.java.opts=-Xms3072m

D.

mapreduce.map.java.opts=-Xmx3072m

You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary NameNode on host mysecondarynamenode and several DataNodes.

Which best describes how you determine when the last checkpoint happened?

A.

Execute hdfs namenode –report on the command line and look at the Last Checkpoint information

B.

Execute hdfs dfsadmin –saveNamespace on the command line which returns to you the last checkpoint value in fstime file

C.

Connect to the web UI of the Secondary NameNode (http://mysecondary:50090/) and look at the “Last Checkpoint” information

D.

Connect to the web UI of the NameNode (http://mynamenode:50070) and look at the “Last Checkpoint” information

You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

A.

MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of “tasks” into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.

B.

In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing –D mapreduce-reduces.memory-mb-2048

C.

In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing –D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.

D.

Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing –D mapreduce.job.reduces-2 will specify reduce tasks.

E.

In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing –p yarn.nodemanager.cpu-vcores=2

You are running Hadoop cluster with all monitoring facilities properly configured.

Which scenario will go undeselected?

A.

HDFS is almost full

B.

The NameNode goes down

C.

A DataNode is disconnected from the cluster

D.

Map or reduce tasks that are stuck in an infinite loop

E.

MapReduce jobs are causing excessive memory swaps

You suspect that your NameNode is incorrectly configured, and is swapping memory to disk. Which Linux commands help you to identify whether swapping is occurring? (Select all that apply)

A.

free

B.

df

C.

memcat

D.

top

E.

jps

F.

vmstat

G.

swapinfo

Which two are features of Hadoop’s rack topology? (Choose two)

A.

Configuration of rack awareness is accomplished using a configuration file. You cannot use a rack topology script.

B.

Hadoop gives preference to intra-rack data transfer in order to conserve bandwidth

C.

Rack location is considered in the HDFS block placement policy

D.

HDFS is rack aware but MapReduce daemon are not

E.

Even for small clusters on a single rack, configuring rack awareness will improve performance

You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?

A.

For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O

B.

Increase the io.sort.mb to 1GB

C.

Decrease the io.sort.mb value to 0

D.

Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.

Page: 1 / 1
Total 60 questions
Copyright © 2014-2025 Solution2Pass. All Rights Reserved