Latest Certified Success Dumps Download

CCA-470 Examination questions (September)

Achieve New Updated (September) Cloudera CCA-470 Examination Questions 11-20

September 24, 2015




You has a cluster running with the Fail Scheduler enabled. There are currently no jobs running on the cluster you submit a job A, so that only job A is running on the cluster. A while later, you submit job

B.Now job A and Job B are running on the cluster al the same time. How will the Fair’ Scheduler handle these two Jobs?



When job A gets submitted, it consumes all the task slot


When job A gets submitted, it doesn’t consume all the task slot


When job B gets submitted, Job A has to finish first, before job it can get scheduled.


When job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks.


Answer: D



Explanation: Fair scheduling is a method of assigning resources to jobs such that all jobs get, on average, an equal share of resources over time. When there is a single job running, that job uses the entire cluster. When other jobs are submitted, tasks slots that free up are assigned to the new jobs, so that each job gets roughly the same amount of CPU time. Unlike the default Hadoop scheduler, which forms a queue of jobs, this lets short jobs finish in reasonable time while not starving long jobs. It is also a reasonable way to share a cluster between a number of users. Finally, fair sharing can also work with job priorities – the priorities are used as weights to determine the fraction of total compute time that each job should get.


Reference: Hadoop, Fair Scheduler Guide




Identify the function performed by the Secondary NameNode daemon on a cluster configured to run with a single NameNode.



In this configuration, the Secondary NameNode performs a checkpoint operation on the files by the NameNode.


In this configuration, the Secondary NameNode is standby NameNode, ready to failover and provide high availability.


In this configuration, the Secondary NameNode performs deal-time backups of the NameNode.


In this configuration, the Secondary NameNode servers as alternate data channel for clients to reach HDFS, should the NameNode become too busy.


Answer: A

Explanation: The term “secondary name-node” is somewhat misleading. It is not a name- node in the sense that data-nodes cannot connect to the secondary name-node, and in no event it can replace the primary name-node in case of its failure.


The only purpose of the secondary name-node is to perform periodic checkpoints. The secondary name-node periodically downloads current name-node image and edits log files, joins them into new image and uploads the new image back to the (primary and the only) name-node.


So if the name-node fails and you can restart it on the same physical node then there is no need to shutdown data-nodes, just the name-node need to be restarted. If you cannot use






the old node anymore you will need to copy the latest image somewhere else. The latest image can be found either on the node that used to be the primary before failure if available; or on the secondary name-node. The latter will be the latest checkpoint without subsequent edits logs, that is the most recent name space modifications may be missing there. You will also need to restart the whole cluster in this case.


Reference: Hadoop Wiki, What is the purpose of the secondary name-node?




When setting the HDFS block size, which of the following considerations is the least important?



Amount of memory on the NameNode.


Number of DataNodes.


Disk capacity of the NameNode.


Number of files that will be stored in HDFS.


Size of “typical” files that will be stored in HDFS.


Answer: C

Reference: considerations(see the answer)




You are running a Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undetected?



Map or reduce tasks that are stuck in an infinite loop.


HDFS is almost full.


The NameNode goes down.


A DataNode is disconnected from the cluster.


MapReduce jobs that are causing excessive memory swaps.


Answer: A





Your Hadoop cluster has 25 nodes with a total of 100 TB (4 TB per node) of raw disk space allocated HDFS storage. Assuming Hadoop’s default configuration, how much data will you be able to store?



Approximately 100TB


Approximately 25TB


Approximately 10TB


Approximately 33 TB


Answer: D

Explanation: In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.


Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How the HDFS Blocks are replicated?




What is the recommended disk configuration for slave nodes in your Hadoop cluster with 6 x 2 TB hard drives?









RAID 1+0


Answer: B

Explanation: Note: Let me be clear here…there are absolutely times when using a Enterprise-class storage device makes perfect sense.But for Hadoop it is very much unnecessary, and it is these three areas that I am going to hit as well as some others that I hope will demonstrate that Hadoop works best with inexpensive, internal storage in JBOD mode.Some of you might say “if you lose a disk in a JBOD configuration, you’re toast…you lose everything”. While this might be true, with Hadoop, it isn’t.Not only do you have the






benefit that JBOD gives you in speed, you have the benefit that Hadoop Distributed File System (HDFS) negates this risk.HDFS basically creates three copies of the data.This is a very robust way to guard against data loss due to a disk failure or node outage, so you can eliminate the need for performance-reducing RAID.


Reference: Hadoop and Storage Area Networks




You are a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?



Run hadoop fs u/and locate the DFS Remaining value.


Connect to http://mynamemode:50070/ and locate the DFS Remaining value.


Run hadoop dfsadmin eport and locate the DFS Remaining value.


Run hadoop dfsadmin paceQuota and subtract HDFS Used from Configured Capacity.


Answer: AC




Identity four pieces of cluster information that are stored on disk on the NameNode?



A catalog of DataNodes and the blocks that are stored on them.


Names of the files in HDFS.


The directory structure of the files in HDFS.


An edit log of changes that have been made since the last snapshot of the NameNode.


An edit log of changes that have been made since the last snapshot compaction by the Secondary NameNode.


File permissions of the files in HDFS.


The status of the heartbeats of each DataNode.


Answer: BCEG

Explanation: B: An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients.






The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes.


The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.

C: The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself


E: The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata.


The SecondaryNameNode periodically compacts the EditLog into a “checkpoint;” the EditLog is then cleared.


G: When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead.


Note: The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. There is only One NameNode process run on any hadoop cluster. NameNode runs on its own JVM process. In a typical production cluster its run on a separatemachine. The NameNode is a Single Point of Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives.




Cluster Summary





45 files and directories, 12 blocks = 57 total. Heap Size is 15.31 MB / 193.38MB(7%)




Refer to the above screenshot.


You configure the Hadoop cluster with seven DataNodes and the NameNode’s web UI displays the details shown in the exhibit.


What does this tells you?



The HDFS cluster is in the safe mode.


Your cluster has lost all HDFS data which had blocks stored on the dead DataNode.


One physical host crashed.


The DataNode JVM on one host is not active.


Answer: A

Explanation: The data from the dead node is being replicated. The cluster is in safemode.



* Safemode

During start up Namenode loads the filesystem state from fsimage and edits log file. It then waits for datanodes to report their blocks so that it does not prematurely start replicating the blocks though enough replicas already exist in the cluster. During this time Namenode stays in safemode. A Safemode for Namenode is essentially a read-only mode for the HDFS cluster, where it does not allow any modifications to filesystem or blocks. Normally Namenode gets out of safemode automatically at the beginning. If required, HDFS could be placed in safemode explicitly using ‘bin/hadoop dfsadmin -safemode’ command. Namenode front page shows whether safemode is on or off. A more detailed description and configuration is maintained as JavaDoc for setSafeMode().

* Data Disk Failure, Heartbeats and Re-Replication






Each DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.


* NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead. Since blocks will be under replicated the system begins replicating the blocks that were stored on the dead datanode. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode.


Incorrrect answers:

B: The data is not lost, it is being replicated.


Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How NameNode Handles data node failures?




Your cluster Mode size is set to 128MB. A client application (client application A) is writing a 500MB file to HDFS. After client application A has written 300MB of data, another client (client application B) attempts to read the file. What is the effect of a second client requesting a file during a write?



Application B can read 256MB of the file






Client application B returns an error


Client application on B can read the 300MB that has been written so far.


Client application B must wait until the entire file has been written, and will then read its entire contents.


Answer: D

Free VCE & PDF File for Cloudera CCA-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …