Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCA-410 Examination questions (September)

Achieve New Updated (September) Cloudera CCA-410 Examination Questions 51-60

September 24, 2015

Ensurepass

 

QUESTION 51

You configure your hadoop cluster with mapreduce V1 (MRv1) along with HDFS high availability (HA) Quorum-based storage. On which nodes should you configure and run your journal node daemon(s) to guarantee a quorum?

 

A.

standby namenode, jobtracker, resourcemanager

B.

Jobtracker

C.

standby namenode

D.

Namenode

E.

Namenode and standby namenode

F.

Namenode, standby namenode and jobtracker

G.

On each datanode

 

Answer: A

Explanation: * JournalNode machines – the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.

Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine.

 

* In order to deploy an HA cluster, you should prepare the following:

/ NameNode machines

/ JournalNode machines

 

* Quorum-based Storage

Quorum-based Storage refers to the HA implementation that uses Quorum Journal Manager (QJM).

 

In order for the Standby node to keep its state synchronized with the Active node in this implementation, both nodes communicate with a group of separate daemons called JournalNodes. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JournalNodes. The Standby node is capable of reading the edits from the JournalNodes, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its

 

 

 

 

 

own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

 

In order to provide a fast failover, it is also necessary that the Standby node has up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and they send block location information and heartbeats to both.

 

 

QUESTION 52

Using cloudera manager on CDH4 cluster running mapreduce V1(MRv1), you delete a tasktracker role instance from a host that also a runs a datanode role instance and a region server role instance.cloudera

 

Manager make changes to the cluster and prompts you to the accept the changes.

 

What other configuration option will cloudera manager automatically prompt you to change?

 

A.

the option to immediately rebalance the cluster

B.

The option to change java maximum heap sizes for the other role instances

C.

the option to specify an alternate slave host to place the received data node role instance

D.

The option to failover to the instance by namenode

 

Answer: C

 

 

QUESTION 53

Your cluster has nodes in seven racks, and you have provided a rack topology script. What is Hadoop’s block placement policy, assuming a block replication factor of three?

 

A.

One copy of the block is written to a node in each of three racks

B.

One copy of the block is written to a node in one rack; two copies are written to two nodes in a different rack

C.

All three of the block are written to nodes on the same rack

 

 

 

 

D.

Because there are seven racks the block is written to a node on each rack

 

Answer: B

Explanation: HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.

 

Note: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How the HDFS Blocks are replicated?

 

 

QUESTION 54

You are running a Hadoop cluster with NameNode on host mynamenode, a secondary NameNode on host mysecondary and DataNodes.

 

Which best describes how you determine when the last checkpoint happened?

 

A.

Execute hdfs dfsadmin eport on the command line in and look at the Last Checkpoint information.

B.

Execute hdfs dfsadmin aveNameSpace on the command line which returns to you the last checkpoint value in fstime file.

C.

Connect to the web UI of the Secondary NameNode (http://mysecondarynamenode:50090) and look at the “Last Checkpoint” information

D.

Connect to the web UI of the NameNode (http://mynamenode:50070/) and look at the “Last Checkpoint” information

 

Answer: C

Explanation:Note: SecondaryNameNode: Is the worst name ever given to the module in the history of naming conventions. It is only a check point server which actually gets a back up of the fsimage+edits files from the namenode.

 

 

 

 

 

It basically serves as a checkpoint server.

 

But it does not come up online automatically when the namenode goes down!

 

Although the secondary namenode can be used to bring up the namenode in the worst case scenario (manually) with some data loss.

 

 

QUESTION 55

Which two occur when individual blocks are returned to DATANODE on a cluster local filesystem?

 

A.

The datanode updates it’s log of checkum verification

B.

The datanode writes a metadata file with the name of the file the block is associated with

C.

A metadata file is written to the datanode containing the checksums for each block

D.

A metadata file is return to the datanode containing all the other node locations in the namespace

E.

The datanode runs a block scanner datablock scanner to verify the return blocks.

 

Answer: BE

Explanation: B (not C):

* The other files in the datanode’s current storage directory are the files with the blk_ prefix. There are two types: the HDFS blocks themselves (which just consist of the file’s raw bytes) and the metadata for a block (with a .meta suffix). A block file just consists of the raw bytes of a portion of the file being stored; the metadata file is made up of a header with version and type information, followed by a series of checksums for sections of the block.

 

* When the number of blocks in a directory grows to a certain size, the datanode creates a new subdirectory in which to place new blocks and their accompanying metadata.

 

E: Every datanode runs a block scanner, which periodically verifies all the blocks stored on the datanode. This allows bad blocks to be detected and fixed before they are read by clients. The DataBlockScanner maintains a list of blocks to verify and scans them one by one for checksum errors.

 

Incorrect:

 

 

 

 

 

Not A: the datanode does not have a checksum verification log. Not D: The node locations are stored on the namenode not on the datanodes.

 

 

QUESTION 56

You have a cluster running with the fair scheduler enabled and configured. You submit multiple jobs to the cluster. Each job is assigned to a pool. What are the two key points to remember about how jobs are scheduled with the fair scheduler?

 

A.

Each pool gets 1/M of the total available tasks slots, where M is the no. of nodes in the cluster

B.

Pools are assigned priorites.pools with higher priorities an executed b4 pools with lower priorities

C.

Each pool gets 1/N of the total available tasks slots, where N is the no of jobs running on the cluster

D.

Pools get a dynamically-allocated share of the available task slots (subject to additional constraints)

E.

Each pools share of the tasks slots remains static within the execution of any individual job

F.

Each pools share of task slots may change throughout the course of job execution

 

Answer: DF

Explanation: By default, the fair scheduler will evenly split available slots between all pools that have “demand.” In this context, demand means there are tasks in the pool that are currently eligible to run. Tasks within a pool may be freely reordered by the scheduler, and often are. As task trackers heartbeat advertising available slots, the scheduler looks for a task that wants to process data on that task tracker. This is largely how it achieves data locality. There are some other details around how tasks are are selected out of a pool (e.g. tasks that will process larger amounts of data receive preference, etc.).

 

Pools can have both weights as well as minimum shares allocated to them. The min share is how you guarantee slots for a pool. The scheduler will always allocate min share slots to all pools first. Any left over slots are spread evenly across pools. If a pool has a weight, it will receive more or less slots, based on the value of the weight, during this extra or “free share” assignment process. Interesting tidbit about weights: the job’s priority is really just changing the weight of its tasks.

 

So far this only refers to how slots are allocated across pools. It is also possible that there are multiple jobs submitted to the same pool. The scheduler actually just runs another

 

 

 

 

 

instance of the fair scheduler within each pool and the default behavior that applies across pools now applies to multiple jobs within a single pool; slots are simply split evenly across all jobs in the pool. This mostly works as expected. If you open two sqlplus sessions to Oracle and ran two queries at the same time, you’d expect them both to make progress, albeit contending for available resources.

 

Reference: http://www.quora.com/Eric-Sammer/answers/Apache-Hadoop

 

 

QUESTION 57

What additional capability does Ganglia provide to monitor a Hadoop?

 

A.

Ability to monitor the amount of free space on HDFS.

B.

Ability to monitor number of files in HDFS.

C.

Ability to monitor processor utilization.

D.

Ability to monitor free task slots.

E.

Ability to monitor NameNode memory usage.

 

Answer: E

Explanation: Ganglia itself collects metrics, such as CPU and memory usage; by using GangliaContext, you can inject Hadoop metrics into Ganglia.

 

Note:

Ganglia is an open-source, scalable and distributed monitoring system for large clusters. It collects, aggregates and provides time-series views of tens of machine-related metrics such as CPU, memory, storage, network usage.

 

Ganglia is also a popular solution for monitoring Hadoop and HBase clusters, since Hadoop (and HBase) has built-in support for publishing its metrics to Ganglia. With Ganglia you may easily see the number of bytes written by a particular HDSF datanode over time, the block cache hit ratio for a given HBase region server, the total number of requests to the HBase cluster, time spent in garbage collection and many, many others.

 

Hadoop and HBase use GangliaContext class to send the metrics collected by each daemon (such as datanode, tasktracker, jobtracker, HMaster etc) to gmonds.

 

 

 

 

 

 

QUESTION 58

Which MapReduce daemon instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v1 (MRv1)?

 

A.

NameNode

B.

DataNode

C.

JobTracker

D.

TaskTracker

E.

ResourceManager

F.

ApplicationMaster

G.

NodeManager

 

Answer: D

Explanation: A TaskTracker is a slave node daemon in the cluster that accepts tasks (Map, Reduce and Shuffle operations) from a JobTracker. There is only One Task Tracker process run on any hadoop slave node. Task Tracker runs on its own JVM process. Every TaskTracker is configured with a set of slots, these indicate the number of tasks that it can accept. The TaskTracker starts a separate JVM processes to do the actual work (called as Task Instance) this is to ensure that process failure does not take down the task tracker. The TaskTracker monitors these task instances, capturing the output and exit codes. When the Task instances finish, successfully or not, the task tracker notifies the JobTracker. The TaskTrackers also send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated.

 

Note: How many Daemon processes run on a Hadoop system?

 

Hadoop is comprised of five separate daemons. Each of these daemon run in its own JVM.

Following 3 Daemons run on Master

nodes NameNode – This daemon stores and maintains the metadata for HDFS. Secondary NameNode – Performs housekeeping functions for the NameNode. JobTracker – Manages MapReduce jobs, distributes individual tasks to machines running the Task Tracker.

 

Following 2 Daemons run on each Slave nodes

 

 

 

 

 

DataNode ?Stores actual HDFS data blocks.

TaskTracker – Responsible for instantiating and monitoring individual Map and Reduce tasks.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is a Task Tracker in Hadoop? How many instances of TaskTracker run on a Hadoop Cluster

 

 

QUESTION 59

Using hadoop’s default settings, how much data will be able to store on your hadoop cluster if it is has 12 nodes with 4TB raw diskspace per node allocated to HDFS storage?

 

A.

Approximately 3TB

B.

Approximately 12TB

C.

Approximately 16TB

D.

Approximately 48TB

 

Answer: C

Explanation: Default replication value is 3.

The following amount of data can be stored:

12 x 4 /3 = 16 TB

 

 

QUESTION 60

You are planning a Hadoop duster, and you expect to be receiving just under 1TB of data per week which will be stored on the cluster, using Hadoop’s default replication. You decide that your slave nodes will be configured with 4 x 1TB disks.

 

Calculate how many slave nodes you need to deploy at a minimum to store one year’s worth of data.

 

A.

100 slave nodes

B.

100 slave nodes

 

 

 

 

C.

10 slave nodes

D.

50 slave nodes

 

Answer: D

Explanation: Total number available space required: 52 (weeks) * 1 (disk space per week)

* 3 (default replication factor) = 156 TB

 

Minimum number of slave nodes required: 156 /4 = 39

 

Free VCE & PDF File for Cloudera CCA-410 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …

 >=”cursor: auto; margin: 0cm 0cm 0pt; line-height: normal; text-autospace: ; mso-layout-grid-align: none” align=”left”> 

QUESTION 51

You configure your hadoop cluster with mapreduce V1 (MRv1) along with HDFS high availability (HA) Quorum-based storage. On which nodes should you configure and run your journal node daemon(s) to guarantee a quorum?

 

A.

standby namenode, jobtracker, resourcemanager

B.

Jobtracker

C.

standby namenode

D.

Namenode

E.

Namenode and standby namenode

F.

Namenode, standby namenode and jobtracker

G.

On each datanode

 

Answer: A

Explanation: * JournalNode machines – the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.

Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine.

 

* In order to deploy an HA cluster, you should prepare the following:

/ NameNode machines

/ JournalNode machines

 

* Quorum-based Storage

Quorum-based Storage refers to the HA implementation that uses Quorum Journal Manager (QJM).

 

In order for the Standby node to keep its state synchronized with the Active node in this implementation, both nodes communicate with a group of separate daemons called JournalNodes. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JournalNodes. The Standby node is capable of reading the edits from the JournalNodes, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its

 

 

 

 

 

own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

 

In order to provide a fast failover, it is also necessary that the Standby node has up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and they send block location information and heartbeats to both.

 

 

QUESTION 52

Using cloudera manager on CDH4 cluster running mapreduce V1(MRv1), you delete a tasktracker role instance from a host that also a runs a datanode role instance and a region server role instance.cloudera

 

Manager make changes to the cluster and prompts you to the accept the changes.

 

What other configuration option will cloudera manager automatically prompt you to change?

 

A.

the option to immediately rebalance the cluster

B.

The option to change java maximum heap sizes for the other role instances

C.

the option to specify an alternate slave host to place the received data node role instance

D.

The option to failover to the instance by namenode

 

Answer: C

 

 

QUESTION 53

Your cluster has nodes in seven racks, and you have provided a rack topology script. What is Hadoop’s block placement policy, assuming a block replication factor of three?

 

A.

One copy of the block is written to a node in each of three racks

B.

One copy of the block is written to a node in one rack; two copies are written to two nodes in a different rack

C.

All three of the block are written to nodes on the same rack

 

 

 

 

D.

Because there are seven racks the block is written to a node on each rack

 

Answer: B

Explanation: HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.

 

Note: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How the HDFS Blocks are replicated?

 

 

QUESTION 54

You are running a Hadoop cluster with NameNode on host mynamenode, a secondary NameNode on host mysecondary and DataNodes.

 

Which best describes how you determine when the last checkpoint happened?

 

A.

Execute hdfs dfsadmin eport on the command line in and look at the Last Checkpoint information.

B.

Execute hdfs dfsadmin aveNameSpace on the command line which returns to you the last checkpoint value in fstime file.

C.

Connect to the web UI of the Secondary NameNode (http://mysecondarynamenode:50090) and look at the “Last Checkpoint” information

D.

Connect to the web UI of the NameNode (http://mynamenode:50070/) and look at the “Last Checkpoint” information

 

Answer: C

Explanation:Note: SecondaryNameNode: Is the worst name ever given to the module in the history of naming conventions. It is only a check point server which actually gets a back up of the fsimage+edits files from the namenode.

 

 

 

 

 

It basically serves as a checkpoint server.

 

But it does not come up online automatically when the namenode goes down!

 

Although the secondary namenode can be used to bring up the namenode in the worst case scenario (manually) with some data loss.

 

 

QUESTION 55

Which two occur when individual blocks are returned to DATANODE on a cluster local filesystem?

 

A.

The datanode updates it’s log of checkum verification

B.

The datanode writes a metadata file with the name of the file the block is associated with

C.

A metadata file is written to the datanode containing the checksums for each block

D.

A metadata file is return to the datanode containing all the other node locations in the namespace

E.

The datanode runs a block scanner datablock scanner to verify the return blocks.

 

Answer: BE

Explanation: B (not C):

* The other files in the datanode’s current storage directory are the files with the blk_ prefix. There are two types: the HDFS blocks themselves (which just consist of the file’s raw bytes) and the metadata for a block (with a .meta suffix). A block file just consists of the raw bytes of a portion of the file being stored; the metadata file is made up of a header with version and type information, followed by a series of checksums for sections of the block.

 

* When the number of blocks in a directory grows to a certain size, the datanode creates a new subdirectory in which to place new blocks and their accompanying metadata.

 

E: Every datanode runs a block scanner, which periodically verifies all the blocks stored on the datanode. This allows bad blocks to be detected and fixed before they are read by clients. The DataBlockScanner maintains a list of blocks to verify and scans them one by one for checksum errors.

 

Incorrect:

 

 

 

 

 

Not A: the datanode does not have a checksum verification log. Not D: The node locations are stored on the namenode not on the datanodes.

 

 

QUESTION 56

You have a cluster running with the fair scheduler enabled and configured. You submit multiple jobs to the cluster. Each job is assigned to a pool. What are the two key points to remember about how jobs are scheduled with the fair scheduler?

 

A.

Each pool gets 1/M of the total available tasks slots, where M is the no. of nodes in the cluster

B.

Pools are assigned priorites.pools with higher priorities an executed b4 pools with lower priorities

C.

Each pool gets 1/N of the total available tasks slots, where N is the no of jobs running on the cluster

D.

Pools get a dynamically-allocated share of the available task slots (subject to additional constraints)

E.

Each pools share of the tasks slots remains static within the execution of any individual job

F.

Each pools share of task slots may change throughout the course of job execution

 

Answer: DF

Explanation: By default, the fair scheduler will evenly split available slots between all pools that have “demand.” In this context, demand means there are tasks in the pool that are currently eligible to run. Tasks within a pool may be freely reordered by the scheduler, and often are. As task trackers heartbeat advertising available slots, the scheduler looks for a task that wants to process data on that task tracker. This is largely how it achieves data locality. There are some other details around how tasks are are selected out of a pool (e.g. tasks that will process larger amounts of data receive preference, etc.).

 

Pools can have both weights as well as minimum shares allocated to them. The min share is how you guarantee slots for a pool. The scheduler will always allocate min share slots to all pools first. Any left over slots are spread evenly across pools. If a pool has a weight, it will receive more or less slots, based on the value of the weight, during this extra or “free share” assignment process. Interesting tidbit about weights: the job’s priority is really just changing the weight of its tasks.

 

So far this only refers to how slots are allocated across pools. It is also possible that there are multiple jobs submitted to the same pool. The scheduler actually just runs another

 

 

 

 

 

instance of the fair scheduler within each pool and the default behavior that applies across pools now applies to multiple jobs within a single pool; slots are simply split evenly across all jobs in the pool. This mostly works as expected. If you open two sqlplus sessions to Oracle and ran two queries at the same time, you’d expect them both to make progress, albeit contending for available resources.

 

Reference: http://www.quora.com/Eric-Sammer/answers/Apache-Hadoop

 

 

QUESTION 57

What additional capability does Ganglia provide to monitor a Hadoop?

 

A.

Ability to monitor the amount of free space on HDFS.

B.

Ability to monitor number of files in HDFS.

C.

Ability to monitor processor utilization.

D.

Ability to monitor free task slots.

E.

Ability to monitor NameNode memory usage.

 

Answer: E

Explanation: Ganglia itself collects metrics, such as CPU and memory usage; by using GangliaContext, you can inject Hadoop metrics into Ganglia.

 

Note:

Ganglia is an open-source, scalable and distributed monitoring system for large clusters. It collects, aggregates and provides time-series views of tens of machine-related metrics such as CPU, memory, storage, network usage.

 

Ganglia is also a popular solution for monitoring Hadoop and HBase clusters, since Hadoop (and HBase) has built-in support for publishing its metrics to Ganglia. With Ganglia you may easily see the number of bytes written by a particular HDSF datanode over time, the block cache hit ratio for a given HBase region server, the total number of requests to the HBase cluster, time spent in garbage collection and many, many others.

 

Hadoop and HBase use GangliaContext class to send the metrics collected by each daemon (such as datanode, tasktracker, jobtracker, HMaster etc) to gmonds.

 

 

 

 

 

 

QUESTION 58

Which MapReduce daemon instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v1 (MRv1)?

 

A.

NameNode

B.

DataNode

C.

JobTracker

D.

TaskTracker

E.

ResourceManager

F.

ApplicationMaster

G.

NodeManager

 

Answer: D

Explanation: A TaskTracker is a slave node daemon in the cluster that accepts tasks (Map, Reduce and Shuffle operations) from a JobTracker. There is only One Task Tracker process run on any hadoop slave node. Task Tracker runs on its own JVM process. Every TaskTracker is configured with a set of slots, these indicate the number of tasks that it can accept. The TaskTracker starts a separate JVM processes to do the actual work (called as Task Instance) this is to ensure that process failure does not take down the task tracker. The TaskTracker monitors these task instances, capturing the output and exit codes. When the Task instances finish, successfully or not, the task tracker notifies the JobTracker. The TaskTrackers also send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated.

 

Note: How many Daemon processes run on a Hadoop system?

 

Hadoop is comprised of five separate daemons. Each of these daemon run in its own JVM.

Following 3 Daemons run on Master

nodes NameNode – This daemon stores and maintains the metadata for HDFS. Secondary NameNode – Performs housekeeping functions for the NameNode. JobTracker – Manages MapReduce jobs, distributes individual tasks to machines running the Task Tracker.

 

Following 2 Daemons run on each Slave nodes

 

 

 

 

 

DataNode ?Stores actual HDFS data blocks.

TaskTracker – Responsible for instantiating and monitoring individual Map and Reduce tasks.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is a Task Tracker in Hadoop? How many instances of TaskTracker run on a Hadoop Cluster

 

 

QUESTION 59

Using hadoop’s default settings, how much data will be able to store on your hadoop cluster if it is has 12 nodes with 4TB raw diskspace per node allocated to HDFS storage?

 

A.

Approximately 3TB

B.

Approximately 12TB

C.

Approximately 16TB

D.

Approximately 48TB

 

Answer: C

Explanation: Default replication value is 3.

The following amount of data can be stored:

12 x 4 /3 = 16 TB

 

 

QUESTION 60

You are planning a Hadoop duster, and you expect to be receiving just under 1TB of data per week which will be stored on the cluster, using Hadoop’s default replication. You decide that your slave nodes will be configured with 4 x 1TB disks.

 

Calculate how many slave nodes you need to deploy at a minimum to store one year’s worth of data.

 

A.

100 slave nodes

B.

100 slave nodes

 

 

 

 

C.

10 slave nodes

D.

50 slave nodes

 

Answer: D

Explanation: Total number available space required: 52 (weeks) * 1 (disk space per week)

* 3 (default replication factor) = 156 TB

 

Minimum number of slave nodes required: 156 /4 = 39

 

Free VCE & PDF File for Cloudera CCA-410 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …