Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCA-470 Examination questions (September)

Achieve New Updated (September) Cloudera CCA-470 Examination Questions 31-40

September 24, 2015

Ensurepass

 

QUESTION 31

You configure you cluster with HDFS High Availability (HA) using Quorum-Based storage.

You do not implement HDFS Federation.

 

What is the maximum number of NameNodes daemon you should run on you cluster in order to avoid a “split-brain” scenario with your NameNodes?

 

A.

Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy.

B.

Two active NameNodes and one Standby NameNode

C.

One active NameNode and one Standby NameNode

D.

Two active NameNodes and two Standby NameNodes

 

Answer: C

Explanation: In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in

 

 

 

 

 

the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

 

Note: It is vital for the correct operation of an HA cluster that only one of the NameNodes be active at a time. Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. In order to ensure this property and prevent the so-called “split-brain scenario,” the JournalNodes will only ever allow a single NameNode to be a writer at a time. During a failover, the NameNode which is to become active will simply take over the role of writing to the JournalNodes, which will effectively prevent the other NameNode from continuing in the Active state, allowing the new Active NameNode to safely proceed with failover.

 

Reference: Cloudera CDH4 High Availability Guide, Quorum-based Storage

 

 

QUESTION 32

What would be a reasonable configuration of disk drives in a Hadoop datanode?

 

A.

Four 1TB disk drives in a RAID configuration

B.

One 1TB disk drive

C.

Four 1TB disk drives in a JBOD configuration

D.

48 1.5TB disk drives in a JBOD configuration

E.

48 1.5 TB disk drives in a RAID configuration

 

Answer: C

Reference: http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some- basic-hardware-recommendations/ (How to pick hardware for your hadoop cluster, see first bulleted point)

 

 

QUESTION 33

How must you format the underlying filesystem of your Hadoop cluster’s slave nodes running on Linux?

 

 

 

 

 

A.

They may be formatted in nay Linux filesystem

B.

They must be formatted as HDFS

C.

They must be formatted as either ext3 or ext4

D.

They must not be formatted – – HDFS will format the filesystem automatically

 

Answer: C

Explanation: The Hadoop Distributed File System is platform independent and can function on top of any underlying file system and Operating System. Linux offers a variety of file system choices, each with caveats that have an impact on HDFS.

 

As a general best practice, if you are mounting disks solely for Hadoop data, disable `noatime’. This speeds up reads for files.

 

There are three Linux file system options that are popular to choose from:

 

Ext3

Ext4

XFS

Yahoo uses the ext3 file system for its Hadoop deployments. ext3 is also the default filesystem choice for many popular Linux OS flavours. Since HDFS on ext3 has been publicly tested on Yahoo’s cluster it makes for a safe choice for the underlying file system.

 

ext4 is the successor to ext3. ext4 has better performance with large files. ext4 also introduced delayed allocation of data, which adds a bit more risk with unplanned server outages while decreasing fragmentation and improving performance.

 

XFS offers better disk space utilization than ext3 and has much quicker disk formatting times than ext3. This means that it is quicker to get started with a data node using XFS.

 

Reference: Hortonworks, Linux File Systems for HDFS

 

 

QUESTION 34

Your cluster implements HDFS High Availability (HA). You two NameNodes are named nn01 and nn02. What occurs when you execute the command:

 

Hdfs haadmin -failover nn01 nn02

 

 

 

 

 

A.

nn02 becomes the standby NameNode and nn02 becomes the active NameNode

B.

Nn01 is fenced, and nn01 becomes the active NameNode

C.

Nn01 is fenced, and nn02 becomes the active NameNode

D.

Nn01 becomes the standby NameNode and nn02 becomes the active NameNode

 

Answer: C

Explanation: Failover- initiate a failover between two NameNodes

 

This subcommand causes a failover from the first provided NameNode to the second. If the first NameNode is in the Standby state, this command simply transitions the second to the Active state without error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one of the methods succeeds. Only after this process will the second NameNode be transitioned to the Active state. If no fencing method succeeds, the second NameNode will not be transitioned to the Active state, and an error will be returned.

 

Reference: HDFS High Availability Administration, HA Administration using the haadmin command

 

 

QUESTION 35

You install Cloudera Manager on a cluster where each host has 1 GB of RAM. All of the services show their status as concerning. However, all jobs submitted complete without an error.

 

Why is Cloudera Manager showing the concerning status KM the services?

 

A.

A slave node’s disk ran out of space

B.

The slave nodes, haven’t sent a heartbeat in 60 minutes

C.

The slave nodes are swapping.

D.

DataNode service instance has crashed.

 

Answer: B

Explanation: Concerning: There is an irregularity in the status of a service instance or role instance, but Cloudera Manager calculates that the instance might recover. For example, if the number of missed heartbeats exceeds a configurable threshold, the health status becomes Concerning. Or, if an instance is running on a host and the host is rebooted, the

 

 

 

 

 

instance will be reported as In Progress for some period of time while it is restarting. Because the instance is expected to be Started, its health will be reported as Concerning until it transitions to started.

 

Note:

Bad: The service instance or role instance is not performing or did not finish performing the last command as expected, and Cloudera Manager calculates that the instance will not recover. For example, if the number of missed heartbeats exceeds a second (higher) configurable threshold, the health status becomes Bad. Another example of bad health is if a role you have stopped is actually still running, or a started role has stopped unexpectedly.

 

Good: The service instance or role instance is performing or has finished performing the last command as expected. This does not necessarily mean the service is running, it means it is behaving as expected. For example, if you clicked Stop to stop a role instance and it stopped successfully, then that role instance has a Good health status, even though it is not running.

 

Reference: About Service, Role, and Host Health

 

 

QUESTION 36

Which two updates occur when a client application opens a stream to begin a file write on a cluster running MapReduce v1 (MRv1)?

 

A.

Once the write stream closes on the DataNode, the DataNode immediately initiates a black report to the NameNode.

B.

The change is written to the NameNode disk.

C.

The metadata in the RAM on the NameNode is flushed to disk.

D.

The metadata in RAM on the NameNode is flushed disk.

E.

The metadata in RAM on the NameNode is updated.

F.

The change is written to the edits file.

 

Answer: DF

Explanation: Note: Namenode stores modifications to the filesystem as a log appended to a native filesystem file (edits). When a Namenode starts up, it reads HDFS state from an image file (fsimage) and then applies edits from edits log file. It then writes new HDFS state to (fsimage) and starts normal operation with an empty edits file. Since namenode merges fsimage and edits files only during start up, edits file could get very large over time on a

 

 

 

 

 

large cluster. Another side effect of larger edits file is that next restart of Namenade takes longer.

 

The secondary namenode merges fsimage and edits log periodically and keeps edits log size with in a limit. It is usually run on a different machine than the primary Namenode since its memory requirements are on the same order as the primary namemode. The secondary namenode is started by bin/start-dfs.sh on the nodes specified in conf/masters file.

 

 

QUESTION 37

What does CDH packaging do on install to facilitate Kerberos security setup?

 

A.

Automatically configure permissions for log files at $MAPPED_LOG_DIR/userlogs

B.

Creates and configures you kdc with default cluster values.

C.

Creates users for hdfs and mapreduce to facilitate role assignment.

D.

Creates a set of pre-configured Kerberos keytab files and their permissions.

E.

Creates directories for temp, hdfs, and mapreduce with correct permissions.

 

Answer: C

Explanation: During CDH4 package installation of MRv1, the following Unix user accounts are automatically created to support security:

 

This User, Runs These Hadoop Programs

hdfs HDFS: NameNode, DataNodes, Secondary NameNode, Standby NameNode (if you are using HA)

mapredMRv1: JobTracker and TaskTrackers

 

Reference: Configuring Hadoop Security in CDH4

 

 

QUESTION 38

Under which of the following scenarios would it be most appropriate to consider using faster (e.g., 10 Gigabit) Ethernet as the network fabric for your Hadoop cluster?

 

 

 

 

 

A.

When the typical workload consists of processor-intensive tasks.

B.

When the typical workload consumes a large amount of input data, relative to the entire Canada of HDFS.

C.

When the typical workload generates a large amount of intermediate data, on the order of the input data itself.

D.

When the typical workload generates a large amount of output data, significantly larger than the amount of intermediate data.

 

Answer: D

 

 

QUESTION 39

You have a cluster running with the FIFO scheduler enabled. You submit a large job A to the cluster which you expect to run for one hour. Then, you submit job B to the cluster, which you expect to run a couple of minutes only. Let’s assume both jobs are running at the same priority.

 

How does the FIFO scheduler execute the jobs? (Choose 3)

 

A.

The order of execution of tasks within a job may vary.

B.

When a job is submitted, all tasks belonging to that job are scheduled.

C.

Given jobs A and B submitted in that order, all tasks from job A will be scheduled before all tasks from job B.

D.

Since job B needs only a few tasks, if might finish before job A completes.

 

Answer: ABC

Reference: http://seriss.com/rush-current/rush/rush-priority.html#FIFO%20Scheduling (see fifo scheduling)

 

 

QUESTION 40

For a MapReduce job, what’s the relationship between tasks and task attempts?

 

A.

There are always exactly as many task attempts as there are tasks.

B.

There are always at least as many task attempts as there are tasks.

C.

There are always at most as many task attempts as there are tasks.

 

Answer: B

 

Free VCE & PDF File for Cloudera CCA-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …