Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCD-470 Examination questions (September)

Achieve New Updated (September) Cloudera CCD-470 Examination Questions 1-10

September 24, 2015

Ensurepass

Exam A

 

QUESTION 1

Table metadata in Hive is:

 

A.

Stored as metadata on the NameNode.

B.

Stored along with the data in HDFS.

C.

Stored in the Metastore.

D.

Stored in ZooKeeper.

 

Answer: C

Explanation: By default, hive use an embedded Derby database to store metadata information. The metastore is the “glue” between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.

 

The Metastore is an application that runs on an RDBMS and uses an open source ORM layer called DataNucleus, to convert object representations into a relational schema and vice versa. They chose this approach as opposed to storing this information in hdfs as they need the Metastore to be very low latency. The DataNucleus layer allows them to plugin many different RDBMS technologies.

 

Note:

* By default, Hive stores metadata in an embedded Apache Derby database, and other client/server databases like MySQL can optionally be used.

* features of Hive include:

Metadata storage in an RDBMS, significantly reducing the time to perform semantic checks during query execution.

 

Reference: Store Hive Metadata into RDBMS

 

 

QUESTION 2

Your cluster’s HDFS block size in 64MB. You have directory containing 100 plain text files, each of which is 100MB in size. The InputFormat for your job is TextInputFormat.

Determine how many Mappers will run?

 

A.

64

B.

100

C.

200

 

 

 

 

D.

640

 

Answer: C

Explanation: Each file would be split into two as the block size (64 MB) is less than the file size (100 MB), so 200 mappers would be running.

 

Note:

If you’re not compressing the files then hadoop will process your large files (say 10G), with a number of mappers related to the block size of the file.

 

Say your block size is 64M, then you will have ~160 mappers processing this 10G file (160*64 ~= 10G). Depending on how CPU intensive your mapper logic is, this might be an acceptable blocks size, but if you find that your mappers are executing in sub minute times, then you might want to increase the work done by each mapper (by increasing the block size to 128, 256, 512m – the actual size depends on how you intend to process the data). Reference:http://stackoverflow.com/questions/11014493/hadoop-mapreduce-appropriate- input-files-size(first answer, second paragraph)

 

New Questions

 

 

QUESTION 3

Which best describes what the map method accepts and emits?

 

A.

It accepts a single key-value pair as input and emits a single key and list of corresponding values as output.

B.

It accepts a single key-value pairs as input and can emit only one key-value pair as output.

C.

It accepts a list key-value pairs as input and can emit only one key-value pair as output.

D.

It accepts a single key-value pairs as input and can emit any number of key-value pair as output, including zero.

 

Answer: D

Explanation: public class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> extends Object

Maps input key/value pairs to a set of intermediate key/value pairs.

 

Maps are the individual tasks which transform input records into a intermediate records.

 

 

 

 

 

The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.

 

Reference: org.apache.hadoop.mapreduce

 

Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

 

 

QUESTION 4

The NameNode uses RAM for the following purpose:

 

A.

To store the contents of files in HDFS.

B.

To store filenames, list of blocks and other meta information.

C.

To store the edits log that keeps track of changes in HDFS.

D.

To manage distributed read and write locks on files in HDFS.

 

Answer: B

Explanation: The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. There is only One NameNode process run on any hadoop cluster. NameNode runs on its own JVM process. In a typical production cluster its run on a separate machine. The NameNode is a Single Point of Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is a NameNode? How many instances of NameNode run on a Hadoop Cluster?

 

 

QUESTION 5

All keys used for intermediate output from mappers must:

 

 

 

 

 

A.

Implement a splittable compression algorithm.

B.

Be a subclass of FileInputFormat.

C.

Implement WritableComparable.

D.

Override isSplitable.

E.

Implement a comparator for speedy sorting.

 

Answer: C

Explanation: The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably of different types.

 

The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.

 

Reference: MapReduce Tutorial

 

 

QUESTION 6

A combiner reduces:

 

A.

The number of values across different keys in the iterator supplied to a single reduce method call.

B.

The amount of intermediate data that must be transferred between the mapper and reducer.

C.

The number of input files a mapper must process.

D.

The number of output files a reducer must produce.

 

Answer: B

Explanation: Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your MapReduce jobs should not depend on the combiners execution.

 

 

 

 

 

 

QUESTION 7

Which describes how a client reads a file from HDFS?

 

A.

The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).

B.

The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.

C.

The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.

D.

The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client.

 

Answer: C

Explanation: The Client communication to HDFS happens using Hadoop HDFS API. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file on HDFS. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How the Client communicates with HDFS?

 

 

QUESTION 8

On a cluster running MapReduce v1 (MRv1), a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot.

 

What determines how the JobTracker assigns each map task to a TaskTracker?

 

A.

The amount of RAM installed on the TaskTracker node.

B.

The amount of free disk space on the TaskTracker node.

 

 

 

 

C.

The number and speed of CPU cores on the TaskTracker node.

D.

The average system load on the TaskTracker node over the past fifteen (15) minutes.

E.

The location of the InsputSplit to be processed in relation to the location of the node.

 

Answer: E

Explanation: The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How JobTracker schedules a task?

 

 

QUESTION 9

Given a Mapper, Reducer, and Driver class packaged into a jar, which is the correct way of submitting the job to the cluster?

 

A.

jar MyJar.jar

B.

jar MyJar.jar MyDriverClass inputdir outputdir

C.

hadoop jar MyJar.jar MyDriverClass inputdir outputdir

D.

hadoop jar class MyJar.jar MyDriverClass inputdir outputdir

 

Answer: C

Explanation: Example:

Run the application:

$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input /usr/joe/wordcount/output

 

 

QUESTION 10

 

A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?

 

A.

The file will be marked as corrupted if data node B fails during the creation of the file.

B.

Each data node locks the local file to prohibit concurrent readers and writers of the file.

C.

Each data node stores a copy of the file in the local file system with the same name as the HDFS file.

D.

The file can be accessed if at least one of the data nodes storing the file is available.

 

Answer: D

Explanation: HDFS keeps three copies of a block on three different datanodes to protect against true data corruption. HDFS also tries to distribute these three replicas on more than one rack to protect against data availability issues. The fact that HDFS actively monitors any failed datanode(s) and upon failure detection immediately schedules re-replication of blocks (if needed) implies that three copies of data on three different nodes is sufficient to avoid corrupted files.

Note:

HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers , How the HDFS Blocks are replicated?

Free VCE & PDF File for Cloudera CCD-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …