Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCA-470 Examination questions (September)

Achieve New Updated (September) Cloudera CCA-470 Examination Questions 21-30

September 24, 2015

Ensurepass

 

QUESTION 21

Which command does Hadoop offer to discover missing or corrupt HDFS data?

 

A.

The map-only checksum utility,

B.

Fsck

C.

Du

D.

Dskchk

E.

Hadoop does not provide any tools to discover missing or corrupt data; there is no need because three replicas are kept for each data block.

 

Answer: B

Explanation: HDFS supports fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for e.g. missing blocks for a file or under replicated blocks. Unlike a traditional fsck utility for native filesystems, this command does not correct the errors it detects.Normally Namenode automatically corrects most of the recoverable failures. HDFS’ fsck is not a Hadoop shell command. It can be run as ‘bin/hadoop fsck’. Fsck can be run on the whole filesystem or on a subset of files.

 

Reference: Hadoop DFS User Guide

 

 

QUESTION 22

What do you need to do when adding new slave nodes to a cluster?

 

A.

Halt and resubmit any running MapReduce jobs to guarantee correctness.

B.

Restart the NameNode daemon.

C.

Add a new entry to /etc/slavenodes on the NameNode host.

D.

Increase the value of dfs.number.of.nodes in hdfs-site.xml.

E.

add the new node’s DNS name to the conf/slaves file on the master node.

 

 

 

 

 

Answer: E

Reference:http://www.quora.com/How-to-add-a-node-in-Hadoop-cluster

 

 

QUESTION 23

In the context of configuring a Hadoop cluster for HDFS High Availability (HA), `fencing’ refers to:

 

A.

Isolating a failed NameNode from write access to the fsimage and edits files so that is cannot resume write operations if it recovers.

B.

Isolating the cluster’s master daemon to limit write access only to authorized clients.

C.

Isolating both HA NameNodes to prevent a client application from killing the NameNode daemons.

D.

Isolating the standby NameNode from write access to the fsimage and edits files.

 

Answer: A

Explanation: A fencing method is a method by which one node can forcibly prevent another node from making continued progress.

 

This might be implemented by killing a process on the other node, by denying the other node’s access to shared storage, or by accessing a PDU to cut the other node’s power.

 

Since these methods are often vendor- or device-specific, operators may implement this interface in order to achieve fencing.

 

Fencing is configured by the operator as an ordered list of methods to attempt. Each method will be tried in turn, and the next in the list will only be attempted if the previous one fails. See NodeFencer for more information.

 

Note:

Failover- initiate a failover between two NameNodes

 

This subcommand causes a failover from the first provided NameNode to the second. If the first NameNode is in the Standby state, this command simply transitions the second to the Active state without error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one of the methods

 

 

 

 

 

succeeds. Only after this process will the second NameNode be transitioned to the Active state. If no fencing method succeeds, the second NameNode will not be transitioned to the Active state, and an error will be returned.

 

Reference: org.apache.hadoop.ha, Interface FenceMethod

 

Reference: HDFS High Availability Administration, HA Administration using the haadmin command

 

 

QUESTION 24

Identify two features/issues that MapReduce v2 (MRv2/YARN) is designed to address:

 

A.

Resource pressure on the JobTrackr

B.

HDFS latency.

C.

Ability to run frameworks other than MapReduce, such as MPI.

D.

Reduce complexity of the MapReduce APIs.

E.

Single point of failure in the NameNode.

F.

Standardize on a single MapReduce API.

 

Answer: AC

Explanation: A:MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.

 

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

 

The ResourceManager and per-node slave, the NodeManager (NM), form the data- computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.

 

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

 

 

 

 

 

C: YARN, as an aspect of Hadoop, has two major kinds of benefits:

The ability to use programmingframeworks other than MapReduce Scalability, no matter what programming framework you use.

 

 

QUESTION 25

The failure of which daemon makes HDFS unavailable on a cluster running MapReduce v1 (MRv1)?

 

A.

Node Manager

B.

Application Manager

C.

Resource Manager

D.

Secondary NameNode

E.

NameNode

F.

DataNode

 

Answer: E

Explanation: The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. There is only One NameNode process run on any hadoop cluster. NameNode runs on its own JVM process. In a typical production cluster its run on a separate machine. The NameNode is a Single Point of Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is a NameNode? How many instances of NameNode run on a Hadoop Cluster?

 

 

QUESTION 26

On a cluster running MapReduce v1 (MRv1), a MapReduce job is given a directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?

 

 

 

 

 

A.

We cannot say; the number of Mappers is determined by the developer

B.

30

C.

10

D.

1

 

Answer: B

 

 

QUESTION 27

You set up the Hadoop cluster using NameNode Federation. One NameNode manages the/users namespace and one NameNode manages the/data namespace. What happens when client tries to write a file to/reports/myreport.txt?

 

A.

The file successfully writes to /users/reports/myreports/myreport.txt.

B.

The client throws an exception.

C.

The file successfully writes to /report/myreport.txt. The metadata for the file is managed by the first NameNode to which the client connects.

D.

The file writes fails silently; no file is written, no error is reported.

 

Answer: C

Explanation: Note:

* The current HDFS architecture allows only a single namespace for the entire cluster. A single Namenode manages this namespace. HDFS Federation addresses limitation of current architecture by adding support multiple Namenodes/namespaces to HDFS file system.

* HDFS Federation enables multiple NameNodes in a cluster for horizontal scalability of NameNode. All these NameNodes work independently and don’t require any co-ordination. A DataNode can register with multiple NameNodes in the cluster and can store the data blocks for multiple NameNodes.

 

clip_image001

 

 

 

 

 

 

QUESTION 28

What metadata is stored on a DataNode when a block is written to it?

 

A.

None. Only the block itself is written.

B.

Checksums for the data in the block, as a separate file.

C.

Information on the file’s location in HDFS.

D.

Node location of each block belonging to the same namespace.

 

Answer: D

Explanation: Each DataNode keeps a small amount of metadata allowing it to identify the cluster it participates in. If this metadata is lost, then the DataNode cannot participate in an HDFS instance and the data blocks it stores cannot be reached.

 

When an HDFS instance is formatted, the NameNode generates a unique namespace id for the instance. When DataNodes first connect to the NameNode, they bind to this namespace id and establish a unique “storage id” that identifies that particular DataNode in the HDFS instance. This data as well as information about what version of Hadoop was used to create the block files, is stored in a filed named VERSION in the ${dfs.data.dir}/current directory.

 

Note: Administrators of HDFS clusters understand that the HDFS metadata is some of the most precious bits they have. While you might have hundreds of terabytes of information stored in HDFS, the NameNode’s metadata is the key that allows this information, spread across several million “blocks” to be reassembled into coherent, ordered files.

 

Reference: Protecting per-DataNode Metadata

 

 

QUESTION 29

Identify the daemon that performs checkpoint operations of the namespace state in a cluster configured with HDFS High Availability (HA) using Quorum based-storage?

 

 

 

 

 

A.

NodeManeger

B.

BackupNode

C.

JournalNode

D.

Standby NameNode

E.

Secondary NameNode

F.

CheckpointNode

G.

NameNode

 

Answer: E

Explanation: Note: SecondaryNameNode: Is the worst name ever given to the module in the history of naming conventions. It is only a check point server which actually gets a back up of the fsimage+edits files from the namenode.

 

It basically serves as a checkpoint server.

 

But it does not come up online automatically when the namenode goes down!

 

Although the secondary namenode can be used to bring up the namenode in the worst case scenario (manually) with some data loss.

 

 

QUESTION 30

What action occurs automatically on a cluster when a DataNode is marked as dead?

 

A.

The NameNode forces re-replication of all the blocks which were stored on the dead DataNode.

B.

The next time a client submits job that requires blocks from the dead DataNode, the JobTracker receives no heart beats from the DataNode. The JobTracker tells the NameNode that the DataNode is dead, which triggers block re-replication on the cluster.

C.

The replication factor of the files which had blocks stored on the dead DataNode is temporarily reduced, until the dead DataNode is recovered and returned to the cluster.

D.

The NameNode informs the client which write the blocks that are no longer available; the client then re-writes the blocks to a different DataNode.

 

Answer: A

Explanation: How NameNode Handles data node failures?

 

NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning

 

 

 

 

 

properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead. Since blocks will be under replicated the system begins replicating the blocks that were stored on the dead datanode. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode.

 

Note:If the Name Node stops receiving heartbeats from a Data Node it presumes it to be dead and any data it had to be gone as well.Based on the block reports it had been receiving from the dead node, the Name Node knows which copies of blocks died along with the node and can make the decision to re-replicate those blocks to other Data Nodes.It will also consult the Rack Awareness data in order to maintain the two copies in one rack, one copy in another rack replica rule when deciding which Data Node should receive a new copy of the blocks.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How NameNode Handles data node failures’

Free VCE & PDF File for Cloudera CCA-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …