Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCA-410 Examination questions (September)

Achieve New Updated (September) Cloudera CCA-410 Examination Questions 21-30

September 24, 2015

Ensurepass

 

QUESTION 21

Compare the hardware requirements of the NameNode with that of the DataNodes in a Hadoop cluster running MapReduce v1 (MRv1):

 

A.

The NameNode requires more memory and requires greater disk capacity than the DataNodes.

B.

The NameNode and DataNodes should the same hardware configuration.

C.

The NameNode requires more memory and no disk drives.

D.

The NameNode requires more memory but less disk capacity.

E.

The NameNode requires less memory and less disk capacity than the DataNodes.

 

Answer: D

Explanation: Note:

* The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. There is only One NameNode process run on any hadoop cluster. NameNode runs on its own JVM process. In a typical production cluster its run on a separate machine. The NameNode is a Single Point of Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives.

 

* A DataNode stores data in the Hadoop File System HDFS. There is only One DataNode process run on any hadoop slave node. DataNode runs on its own JVM process. On startup, a DataNode connects to the NameNode. DataNode instances can talk to each other, this is mostly during replicating data.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers

 

 

QUESTION 22

Which three processes does HDFS High Availability (HA) enable on your cluster?

 

A.

Automatically ‘fail over’ between NameNodes if one goes down

B.

Write data to two clusters simultaneously

C.

Shut one NameNode down for maintenance without halting the cluster

 

 

 

 

D.

Manually ‘fail over’ between NameNodes

E.

Configure unlimited hot standby NameNode.

 

Answer: ACD

Explanation: The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.

 

 

QUESTION 23

A client application opens a file write stream on your cluster. Which two metadata changes occur during a file write?

 

A.

the namenode triggers a block report to update block locations in the edits file

B.

The change is written to the namenode disk

C.

The change is written to the edits file

D.

The metadata in a Ram on the name node is updated

E.

the metadata in Ram on the namenode is flushed to disk

F.

the change is written to the fsimage file

G.

the change is written to the secondary namenode

 

Answer: CE

Explanation:

Note: Namenode stores modifications to the filesystem as a log appended to a native filesystem file (edits). When a Namenode starts up, it reads HDFS state from an image file (fsimage) and then applies edits from edits log file. It then writes new HDFS state to (fsimage) and starts normal operation with an empty edits file. Since namenode merges fsimage and edits files only during start up, edits file could get very large over time on a large cluster. Another side effect of larger edits file is that next restart of Namenade takes longer.

 

The secondary namenode merges fsimage and edits log periodically and keeps edits log size with in a limit. It is usually run on a different machine than the primary Namenode since its memory requirements are on the same order as the primary namemode. The secondary namenode is started by bin/start-dfs.sh on the nodes specified in conf/masters

 

 

 

 

 

file.

 

 

QUESTION 24

Identify two features/issues that MapReduce v2 (MRv2/YARN) is designed to address:

 

A.

Resource pressure on the JobTrackr

B.

HDFS latency.

C.

Ability to run frameworks other than MapReduce, such as MPI.

D.

Reduce complexity of the MapReduce APIs.

E.

Single point of failure in the NameNode.

F.

Standardize on a single MapReduce API.

 

Answer: AC

Explanation: A: MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.

 

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

 

The ResourceManager and per-node slave, the NodeManager (NM), form the data- computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.

 

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

 

C: YARN, as an aspect of Hadoop, has two major kinds of benefits:

The ability to use programming frameworks other than MapReduce. Scalability, no matter what programming framework you use.

 

 

 

 

 

 

QUESTION 25

For each job, the Hadoop framework generates task log files. Where are Hadoop’s task log files stored?

 

A.

Cached on the local disk of the slave node running the task, then purged immediately upon task completion.

B.

Cached on the local disk of the slave node running the task, then copied into HDFS.

C.

In HDFS, in the directory of the user who generates the job.

D.

On the local disk of the slave node running the task.

 

Answer: D

Reference: Apache Hadoop Log Files: Where to find them in CDH, and what info they contain

 

 

QUESTION 26

A slave node in your cluster has 24GB of Ram and 12 physical processor cores on hyper threading-enabled processor. You set the value of mapped. child.java.opts to mx1G, and the value of mapred.tasktracker.map.tasks.maximum to 12. What is the appropriate value to set for mapred.Tastracker.reduce.tasks.maximum?

 

A.

24

B.

16.

C.

6

D.

2

E.

12

F.

20

 

Answer: E

Explanation: There is 24 GB available. Each node will use 1 GB (child.java.opts to mx1G). We should use a maximum of 24 nodes (12 mappers and 12 reducers).

 

Note:

* Use mapred.tasktracker.map.tasks.maximum and

mapred.tasktracker.reduce.tasks.maximum to control the maximum number of map/reduce tasks spawned simultaneously on a TaskTracker.

 

* Hadoop should be configured correctly to get the best performance. If on the Analysis

 

 

 

 

 

phase the hadoop machines go into a non responsive state (e.g., you cannot connect to them) and it’s determined that this is because of very high CPU usage and it doesn’t get out of that state for a few hours, this could be a hadoop configuration problem.

 

The following formula should be applied to each hadoop node (master and slave) so that the machines never go into the non responsive mode:

 

(number of mapper + number of reducers) * memory setting < available memory

 

number of mappers + number of reducers <= available CPU’s

 

memory per mapper/reducer can not be less than 2GB

 

 

QUESTION 27

You are a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?

 

A.

Run hadoop fs u/and locate the DFS Remaining value.

B.

Connect to http://mynamemode:50070/ and locate the DFS Remaining value.

C.

Run hadoop dfsadmin eport and locate the DFS Remaining value.

D.

Run hadoop dfsadmin paceQuota and subtract HDFS Used from Configured Capacity.

 

Answer: BC

 

 

QUESTION 28

Which MapReduce v2 (MR2/YARN) daemon is a per-machine slave responsible for launching application containers and monitoring application resources usage?

 

A.

JobTracker

B.

ResourceManager

C.

ApplicationMaster

D.

NodeManager

 

 

 

 

E.

ApplicationMasterService

F.

TaskTracker

 

Answer: C

Explanation: The fundamental idea of MRv2 (YARN) is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per- application ApplicationMaster (AM).

 

Note: Let’s walk through an application execution sequence :

A client programsubmitsthe application, including the necessary specifications tolaunch the application-specific ApplicationMasteritself. The ResourceManager assumes the responsibility to negotiate a specified container in which to start the ApplicationMaster and thenlaunchesthe ApplicationMaster.

The ApplicationMaster, on boot-up,registerswith the ResourceManager ?the registration allows the client program to query the ResourceManager for details, which allow it to directly communicate with its own ApplicationMaster. During normal operation the ApplicationMaster negotiates appropriate resource containers via the resource-request protocol.

On successful container allocations, the ApplicationMaster launches the container by providing the container launch specification to the NodeManager. The launch specification, typically, includes the necessary information to allow the container to communicate with the ApplicationMaster itself.

The application code executing within the container then provides necessary information (progress, status etc.) to its ApplicationMaster via anapplication- specific protocol.

During the application execution, the client that submitted the program communicates directly with the ApplicationMaster to get status, progress updates etc. via an application-specific protocol.

Once the application is complete, and all necessary work has been finished, the ApplicationMaster deregisters with the ResourceManager and shuts down, allowing its own container to be repurposed.

 

Reference: Apache Hadoop YARN ?Concepts&; Applications

 

 

QUESTION 29

What four functions do scheduling algorithms perform on hadoop cluster?

 

A.

Run jobs at periodic times of the day

 

 

 

 

B.

Reduce the job latencies in environment with multiple jobs of different sizes

C.

Allow multiple users to share clusters in a predictable policy-guided manner

D.

support the implementation of service-level agreements for multiple cluster users

E.

allow short jobs to complete even when large, long jobs (consuming a lot of resources are running)

F.

Reduce the total amount of computation necessary to complete a job.

G.

Ensure data locality by ordering map tasks so that they run on data local maps slots

 

Answer: ABEF

Explanation:Hadoop schedulers

Since the pluggable scheduler was implemented, several scheduler algorithms have been developed for it.

/ FIFO scheduler

The original scheduling algorithm that was integrated within the JobTracker was called FIFO. In FIFO scheduling, a JobTracker pulled jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job, but the approach was simple to implement and efficient.

/ Fair scheduler

The core idea behind the fair share scheduler was to assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoop jobs and permits greater responsiveness of the Hadoop cluster to the variety of job types submitted. The fair scheduler was developed by Facebook.

 

/ Capacity scheduler

The capacity scheduler shares some of the principles of the fair scheduler but has distinct differences, too. First, capacity scheduling was defined for large clusters, which may have multiple, independent consumers and target applications. For this reason, capacity scheduling provides greater control as well as the ability to provide a minimum capacity guarantee and share excess capacity among users. The capacity scheduler was developed by Yahoo!.

In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue’s capacity).

 

* The introduction of the pluggable scheduler was yet another evolution in cluster computing with Hadoop. The pluggable scheduler permits the use (and development) of schedulers optimized for the particular workload and application. The new schedulers have also made it possible to create multi-user data warehouses with Hadoop, given the ability

 

 

 

 

 

to share the overall Hadoop infrastructure with multiple users and organizations.

 

 

QUESTION 30

Identify the daemon that performs checkpoint operations of the namespace state in a cluster configured with HDFS High Availability (HA) using Quorum based-storage?

 

A.

NodeManeger

B.

BackupNode

C.

JournalNode

D.

Standby NameNode

E.

Secondary NameNode

F.

CheckpointNode

G.

NameNode

 

Answer: E

Free VCE & PDF File for Cloudera CCA-410 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …