Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCA-410 Examination questions (September)

Achieve New Updated (September) Cloudera CCA-410 Examination Questions 11-20

September 24, 2015

Ensurepass

 

QUESTION 11

What is the smallest number of slave nodes you would need to configure in your hadoop cluster to store 100TB of data, using Hadoop default replication values, on nodes with 10TB of RAW disk space per node?

 

A.

100

B.

25

C.

10

D.

40

E.

75

 

Answer: D

Explanation: Default replication value is 3.

The minimum number of nodes needed are:

100 x 3 /10 = 30

 

 

 

 

 

The closest answer is here D (40).

 

 

QUESTION 12

You configure Hadoop cluster with both MapReduce frameworks, MapReduce v1 (MRv1) and MapReduce v2 (MRv2/YARN). Which two MapReduce (computational) daemons do you need to configure to run on your master nodes?

 

A.

JobTracker

B.

ResourceManager

C.

ApplicationMaster

D.

JournalNode

E.

NodeManager

 

Answer: AB

Explanation:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

 

As you can see, ApplicationMaster is in slave nodes instead of master nodes. So it won’t be the answer.

 

Only JobTracker and ResourceManger are MapReduce daemons running in master nodes so the answers

 

 

QUESTION 13

Your cluster Mode size is set to 128MB. A client application (client application A) is writing a 500MB file to HDFS. After client application A has written 300MB of data, another client (client application B) attempts to read the file. What is the effect of a second client requesting a file during a write?

 

A.

Application B can read 256MB of the file

B.

Client application B returns an error

C.

Client application on B can read the 300MB that has been written so far.

 

 

 

 

D.

Client application B must wait until the entire file has been written, and will then read its entire contents.

 

Answer: A

 

 

QUESTION 14

Which three distcp features can you utilize on a Hadoop cluster?

 

A.

Use distcp to copy files only between two clusters or more. You cannot use distcp to copy data between directories inside the same cluster.

B.

Use distcp to copy HBase table files.

C.

Use distcp to copy physical blocks from the source to the target destination in your cluster.

D.

Use distcp to copy data between directories inside the same cluster.

E.

Use distcp to run an internal MapReduce job to copy files.

 

Answer: BDE

Explanation:DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. Its Map/Reduce pedigree has endowed it with some quirks in both its semantics and execution.

 

Reference: Hadoop DistCp Guide

 

 

QUESTION 15

What determines the number of Reduces that run a given MapReduce job on a cluster running MapReduce v1 (MRv1)?

 

A.

It is set by the Hadoop framework and is based on the number of InputSplits of the job.

B.

It is set by the developer.

C.

It is set by the JobTracker based on the amount of intermediate data.

D.

It is set and fixed by the cluster administrator in mapred-site.xml. The number set always run for any submitted job.

 

 

 

 

 

Answer: B

Explanation: Number of Reduces

 

The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.

 

Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces << heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.

 

The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.

 

The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf’s conf.setNumReduceTasks(int num).

 

Reference: org.apache.hadoop.mapred

 

Class JobConf

 

 

QUESTION 16

How does the NameNode know DataNodes are available on a cluster running MapReduce v1 (MRv1)

 

A.

DataNodes listed in the dfs.hosts file. The NameNode uses as the definitive list of available DataNodes.

B.

DataNodes heartbeat in the master on a regular basis.

C.

The NameNode broadcasts a heartbeat on the network on a regular basis, and DataNodes respond.

D.

The NameNode send a broadcast across the network when it first starts, and DataNodes respond.

 

Answer: B

Explanation: How NameNode Handles data node failures?

 

 

 

 

 

NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead. Since blocks will be under replicated the system begins replicating the blocks that were stored on the dead datanode. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How NameNode Handles data node failures?

 

 

QUESTION 17

Your developers request that you enable them to use Hive on your Hadoop cluster. What do install and/or configure?

 

A.

Install the Hive interpreter on the client machines only, and configure a shared remote Hive Metastore.

B.

Install the Hive Interpreter on the client machines and all the slave nodes, and configure a shared remote Hive Metastore.

C.

Install the Hive interpreter on the master node running the JobTracker, and configure a shared remote Hive Metastore.

D.

Install the Hive interpreter on the client machines and all nodes on the cluster

 

Answer: A

Explanation: The Hive Interpreter runs on a client machine.

 

 

QUESTION 18

On a cluster running MapReduce v1 (MRv1), a MapReduce job is given a directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?

 

 

 

 

 

A.

We cannot say; the number of Mappers is determined by the developer

B.

30

C.

10

D.

1

 

Answer: B

 

 

QUESTION 19

Your company stores user profile records in an OLTP database. You want to join these records with webserver logs. You have already ingested into the hadoop file system. What is the best way to obtain and ingest these user records?

 

A.

Ingest with flume agents

B.

Ingest with hadoop streaming

C.

Ingest using hives LOAD DATA command

D.

Ingest with SQL import

E.

Ingest with Pig’s Load Command

F.

Ingest using the HDFS put command

 

Answer: A

Explanation: Flume is a distributed, reliable, available service for efficiently moving large amounts of data as it is produced

Flume is ideally suited to gathering logs from multiple systems and inserting them into HDFS as they are generated.

 

Flume: High-Level Overview

 

clip_image001

 

 

 

 

 

 

QUESTION 20

Your cluster implements hdfs high availability (HA) your two namenodes are named hadoop01 and hadoop02. What occurs when you execute the command:

 

Sudo -u hdfs haadmin -failover hadoop01 hadoop02

 

A.

Hadoop02 becomes the standby namenode and hadoop01 becomes the active namenode

B.

Hadoop02 is fenced, and hadoop01 becomes active namenode

C.

Hadoop01 becomes inactive and hadoop02 becomes the active namenode

D.

Hadoop01 is fenced, and hadoop02 becomes the active namenode

 

Answer: D

Explanation: Failover- initiate a failover between two NameNodes This subcommand causes a failover from the first provided NameNode to the second. If the first

NameNode is in the Standby state, this command simply transitions the second to the Active statewithout error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by

dfs.ha.fencing.methods) will be attempted in order until one of the methods succeeds. Only after

this process will the second NameNode be transitioned to the Active state. If no fencing method

succeeds, the second NameNode will not be transitioned to the Active state, and an error will be

returned.

 

Reference: HDFS High Availability Administration, HA Administration using the haadmin command

 

 

Free VCE & PDF File for Cloudera CCA-410 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …

 >=”cursor: auto; margin: 0cm 0cm 0pt; line-height: normal; text-autospace: ; mso-layout-grid-align: none” align=”left”> 

QUESTION 11

What is the smallest number of slave nodes you would need to configure in your hadoop cluster to store 100TB of data, using Hadoop default replication values, on nodes with 10TB of RAW disk space per node?

 

A.

100

B.

25

C.

10

D.

40

E.

75

 

Answer: D

Explanation: Default replication value is 3.

The minimum number of nodes needed are:

100 x 3 /10 = 30

 

 

 

 

 

The closest answer is here D (40).

 

 

QUESTION 12

You configure Hadoop cluster with both MapReduce frameworks, MapReduce v1 (MRv1) and MapReduce v2 (MRv2/YARN). Which two MapReduce (computational) daemons do you need to configure to run on your master nodes?

 

A.

JobTracker

B.

ResourceManager

C.

ApplicationMaster

D.

JournalNode

E.

NodeManager

 

Answer: AB

Explanation:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

 

As you can see, ApplicationMaster is in slave nodes instead of master nodes. So it won’t be the answer.

 

Only JobTracker and ResourceManger are MapReduce daemons running in master nodes so the answers

 

 

QUESTION 13

Your cluster Mode size is set to 128MB. A client application (client application A) is writing a 500MB file to HDFS. After client application A has written 300MB of data, another client (client application B) attempts to read the file. What is the effect of a second client requesting a file during a write?

 

A.

Application B can read 256MB of the file

B.

Client application B returns an error

C.

Client application on B can read the 300MB that has been written so far.

 

 

 

 

D.

Client application B must wait until the entire file has been written, and will then read its entire contents.

 

Answer: A

 

 

QUESTION 14

Which three distcp features can you utilize on a Hadoop cluster?

 

A.

Use distcp to copy files only between two clusters or more. You cannot use distcp to copy data between directories inside the same cluster.

B.

Use distcp to copy HBase table files.

C.

Use distcp to copy physical blocks from the source to the target destination in your cluster.

D.

Use distcp to copy data between directories inside the same cluster.

E.

Use distcp to run an internal MapReduce job to copy files.

 

Answer: BDE

Explanation:DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. Its Map/Reduce pedigree has endowed it with some quirks in both its semantics and execution.

 

Reference: Hadoop DistCp Guide

 

 

QUESTION 15

What determines the number of Reduces that run a given MapReduce job on a cluster running MapReduce v1 (MRv1)?

 

A.

It is set by the Hadoop framework and is based on the number of InputSplits of the job.

B.

It is set by the developer.

C.

It is set by the JobTracker based on the amount of intermediate data.

D.

It is set and fixed by the cluster administrator in mapred-site.xml. The number set always run for any submitted job.

 

 

 

 

 

Answer: B

Explanation: Number of Reduces

 

The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.

 

Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces << heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.

 

The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.

 

The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf’s conf.setNumReduceTasks(int num).

 

Reference: org.apache.hadoop.mapred

 

Class JobConf

 

 

QUESTION 16

How does the NameNode know DataNodes are available on a cluster running MapReduce v1 (MRv1)

 

A.

DataNodes listed in the dfs.hosts file. The NameNode uses as the definitive list of available DataNodes.

B.

DataNodes heartbeat in the master on a regular basis.

C.

The NameNode broadcasts a heartbeat on the network on a regular basis, and DataNodes respond.

D.

The NameNode send a broadcast across the network when it first starts, and DataNodes respond.

 

Answer: B

Explanation: How NameNode Handles data node failures?

 

 

 

 

 

NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead. Since blocks will be under replicated the system begins replicating the blocks that were stored on the dead datanode. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How NameNode Handles data node failures?

 

 

QUESTION 17

Your developers request that you enable them to use Hive on your Hadoop cluster. What do install and/or configure?

 

A.

Install the Hive interpreter on the client machines only, and configure a shared remote Hive Metastore.

B.

Install the Hive Interpreter on the client machines and all the slave nodes, and configure a shared remote Hive Metastore.

C.

Install the Hive interpreter on the master node running the JobTracker, and configure a shared remote Hive Metastore.

D.

Install the Hive interpreter on the client machines and all nodes on the cluster

 

Answer: A

Explanation: The Hive Interpreter runs on a client machine.

 

 

QUESTION 18

On a cluster running MapReduce v1 (MRv1), a MapReduce job is given a directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?

 

 

 

 

 

A.

We cannot say; the number of Mappers is determined by the developer

B.

30

C.

10

D.

1

 

Answer: B

 

 

QUESTION 19

Your company stores user profile records in an OLTP database. You want to join these records with webserver logs. You have already ingested into the hadoop file system. What is the best way to obtain and ingest these user records?

 

A.

Ingest with flume agents

B.

Ingest with hadoop streaming

C.

Ingest using hives LOAD DATA command

D.

Ingest with SQL import

E.

Ingest with Pig’s Load Command

F.

Ingest using the HDFS put command

 

Answer: A

Explanation: Flume is a distributed, reliable, available service for efficiently moving large amounts of data as it is produced

Flume is ideally suited to gathering logs from multiple systems and inserting them into HDFS as they are generated.

 

Flume: High-Level Overview

 

clip_image001

 

 

 

 

 

 

QUESTION 20

Your cluster implements hdfs high availability (HA) your two namenodes are named hadoop01 and hadoop02. What occurs when you execute the command:

 

Sudo -u hdfs haadmin -failover hadoop01 hadoop02

 

A.

Hadoop02 becomes the standby namenode and hadoop01 becomes the active namenode

B.

Hadoop02 is fenced, and hadoop01 becomes active namenode

C.

Hadoop01 becomes inactive and hadoop02 becomes the active namenode

D.

Hadoop01 is fenced, and hadoop02 becomes the active namenode

 

Answer: D

Explanation: Failover- initiate a failover between two NameNodes This subcommand causes a failover from the first provided NameNode to the second. If the first

NameNode is in the Standby state, this command simply transitions the second to the Active statewithout error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by

dfs.ha.fencing.methods) will be attempted in order until one of the methods succeeds. Only after

this process will the second NameNode be transitioned to the Active state. If no fencing method

succeeds, the second NameNode will not be transitioned to the Active state, and an error will be

returned.

 

Reference: HDFS High Availability Administration, HA Administration using the haadmin command

 

 

Free VCE & PDF File for Cloudera CCA-410 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …