Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCA-470 Examination questions (September)

Achieve New Updated (September) Cloudera CCA-470 Examination Questions 1-10

September 24, 2015

Ensurepass

 


Exam A

 

QUESTION 1

It is recommended that you run the HDFS balancer periodically. Why? (Choose 3)

 

A.

To improve data locality for MapReduce tasks.

B.

To ensure that there is capacity in HDTS for additional data.

C.

To help HDFS deliver consistent performance under heavy loads.

D.

To ensure that all blocks in the cluster are 128MB in size.

E.

To ensure that there is consistent disk utilization across the DataNodes.

 

Answer: BCE

Reference:

http://hadoop.apache.org/docs/hdfs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Ba lance.html

 

 

QUESTION 2

Which MapReduce daemon instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v1 (MRv1)?

 

A.

NameNode

B.

DataNode

C.

JobTracker

D.

TaskTracker

E.

ResourceManager

F.

ApplicationMaster

G.

NodeManager

 

Answer: D

Explanation: A TaskTracker is a slave node daemon in the cluster that accepts tasks (Map, Reduce and Shuffle operations) from a JobTracker. There is only One Task Tracker process run on any hadoop slave node. Task Tracker runs on its own JVM process. Every TaskTracker is configured with a set ofslots, these indicate the number of tasks that it can accept. The TaskTracker starts a separate JVM processes to do the actual work (called as Task Instance) this is to ensure that process failure does not take down the task tracker. The TaskTracker monitors these task instances, capturing the output and exit codes. When the Task instances finish, successfully or not, the task tracker notifies the JobTracker. The TaskTrackers also send out heartbeat messages to the JobTracker, usually every few

 

 

 

 

 

minutes, to reassure the JobTracker thatit is still alive. Thesemessage also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated.

 

Note: How many Daemon processes run on a Hadoop system?

 

Hadoop is comprised of five separate daemons. Each of these daemon run in its own JVM. Following 3 Daemons run on Masternodes NameNode – This daemon stores and maintains the metadata for HDFS.

 

Secondary NameNode – Performs housekeeping functions for the NameNode. JobTracker – Manages MapReduce jobs, distributes individual tasks to machines running the Task Tracker.

 

Following 2 Daemons run on each Slave nodes

DataNode ?Stores actual HDFS data blocks.

TaskTracker – Responsible for instantiating and monitoring individual Map and Reduce tasks.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is a Task Tracker in Hadoop? How many instances of TaskTracker run on a Hadoop Cluster

 

 

QUESTION 3

What information is stored on disk on the NameNode? (Choose 4)

 

A.

File permissions of the files in HDFS.

B.

An edit log of changes that have been made since the last backup of the NameNode.

C.

A catalog of DataNodes and the blocks that are stored on them.

D.

Names of the files in HDFS.

E.

The directory structure of the files in HDFS.

F.

An edit log of changes that have been made since the last snapshot compaction by the Secondary NameNode.

G.

The status of the heartbeats of each DataNode.

 

Answer: CDEF

 

 

 

QUESTION 4

Your existing Hadoop cluster has 30 slave nodes, each of which has 4 x 2T hard drives. You plan to add another 10 nodes. How much disk space can your new nodes contain?

 

A.

The new nodes must all contain 8TB of disk space, but it does not matter how the disks are configured

B.

The new nodes cannot contain more than 8TB of disk space

C.

The new nodes can contain any amount of disk space

D.

The new nodes must all contain 4 x 2TB hard drives

 

Answer: C

 

 

QUESTION 5

You have a cluster running with the Fair in Scheduler enabled. There are currently no jobs running on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you submit job B, Now job A and job B are running on the cluster at the same time.

 

Which of the following describes how the Fair Scheduler operates? (Choose 2)

 

A.

When job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks.

B.

When job A gets submitted, it doesn’t consume all the task slots.

C.

When job A gets submitted, it consumes all the task slots.

D.

When job B gets submitted, job A has to finish first, before job B can get scheduled.

 

Answer: CD

Reference: http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html (introduction, first paragraph)

 

 

QUESTION 6

Assuming a large properly configured multi-rack Hadoop cluster, which scenario should not result in loss of HDFS data assuming the default replication factor settings?

 

 

 

 

 

A.

Ten percent of DataNodes simultaneously fail.

B.

All DataNodes simultaneously fail.

C.

An entire rack fails.

D.

Multiple racks simultaneously fail.

E.

Seventy percent of DataNodes simultaneously fail.

 

Answer: A

Reference: http://stackoverflow.com/questions/12399197/in-a-large-properly-configured- multi-rack-hadoop-cluster-which-scenarios-will-b

 

 

QUESTION 7

Where does a MapReduce job store the intermediate data output from Mappers?

 

A.

On the underlying filesystem of the local disk machine on which the JobTracker ran.

B.

In HDFS, in the job’s output directory.

C.

In HDFS, in temporary directory defined mapred.tmp.dir.

D.

On the underlying filesystem of the local disk of the machine on which the Mapper ran.

E.

Stores on the underlying filesystem of the local disk of the machine on which the Reducer.

 

Answer: D

Explanation: The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper nodes. This is typically a temporary directory location which can be setup in config by the hadoop administrator. The intermediate data is cleaned up after the Hadoop Job completes.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers , Where is the Mapper Output (intermediate kay-value data) stored ?

 

 

QUESTION 8

You have cluster running with the FIFO Scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to cluster, which you expect to run a couple of minutes only.

 

 

 

 

You submit both jobs with the same priority.

 

Which two best describes how the FIFO Scheduler arbitrates the cluster resources for a job and its tasks?

 

A.

Given Jobs A and B submitted in that order, all tasks from job A are guaranteed to finish before all tasks from job B.

B.

The order of execution of tasks within a job may vary.

C.

Tasks are scheduled in the order of their jobs’ submission.

D.

The FIFO Scheduler will give, on average, equal share of the cluster resources over the job lifecycle.

E.

Because there is more then a single job on the cluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time.

F.

The FIFO Schedule will pass an exception back to the client when Job B is submitted, since all slots on the cluster are in use.

 

Answer: AC

Explanation: FIFO (first-in first-out) scheduling treats a job’s importance relative to when it was submitted.

 

The original scheduling algorithm that was integrated within the JobTracker was called FIFO. In FIFO scheduling, a JobTracker pulled jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job, but the approach was simple to implement and efficient.

 

 

QUESTION 9

Which three distcp features can you utilize on a Hadoop cluster?

 

A.

Use distcp to copy files only between two clusters or more. You cannot use distcp to copy data between directories inside the same cluster.

B.

Use distcp to copy HBase table files.

C.

Use distcp to copy physical blocks from the source to the target destination in your cluster.

D.

Use distcp to copy data between directories inside the same cluster.

E.

Use distcp to run an internal MapReduce job to copy files.

 

Answer: BDE

 

 

Explanation: DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. Its Map/Reduce pedigree has endowed it with some quirks in both its semantics and execution.

 

Reference: Hadoop DistCp Guide

 

 

QUESTION 10

How does the HDFS architecture provide redundancy?

 

A.

Storing multiple replicas of data blocks on different DataNodes.

B.

Reliance on RAID at each datanode.

C.

Reliance on SAN devices as a DataNode interface.

D.

DataNodes make copies of their data blocks, and put them on different local disks.

 

Answer: A

Reference:http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the- network/(writing files to HDFS)

Free VCE & PDF File for Cloudera CCA-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …