Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCD-333 Examination questions (September)

Achieve New Updated (September) Cloudera CCD-333 Examination Questions 1-10

September 24, 2015

Ensurepass

 


Exam A

 

QUESTION 1

In the reducer, the MapReduce API provides you with an iterator over Writable values.

Calling the next () method:

 

A.

Returns a reference to a different Writable object each time.

B.

Returns a reference to a Writable object from an object pool.

C.

Returns a reference to the same writable object each time, but populated with different data.

D.

Returns a reference to a Writable object. The API leaves unspecified whether this is a reused object or a new object.

E.

Returns a reference to the same writable object if the next value is the same as the previous value, or a new writable object otherwise.

 

Answer: C

Explanation: Calling Iterator.next() will always return the SAME EXACT instance of IntWritable, with the contents of that instance replaced with the next value.

 

Reference:manupulating iterator in mapreduce

 

 

QUESTION 2

What is the behavior of the default partitioner?

 

A.

The default partitioner assigns key value pairs to reducers based on an internal random number generator.

B.

The default partitioner implements a round robin strategy, shuffling the key value pairs to each reducer in turn. This ensures an even partition of the key space.

C.

The default partitioner computes the hash of the key. Hash values between specific ranges are associated with different buckets, and each bucket is assigned to a specific reducer.

D.

The default partitioner computes the hash of the key and divides that value modulo the number of reducers. The result determines the reducer assigned to process the key-value pair.

E.

The default partitioner computes the hash of the value and takes the mod of that value with the number of reducers. The result determines the reducer assigned to process the key value pair.

 

Answer: D

Explanation: The default partitioner computes a hash value for the key and assigns the

 

 

 

 

 

partition based on this result.

 

The default Partitioner implementation is called HashPartitioner. It uses the hashCode() method of the key objects modulo the number of partitions total to determine which partition to send a given (key, value) pair to.

 

In Hadoop, the default partitioner is HashPartitioner, which hashes a record’s key to determine which partition (and thus which reducer) the record belongs in.The number of partition is then equal to the number of reduce tasks for the job.

 

Reference:Getting Started With (Customized) Partitioning

 

 

QUESTION 3

During the standard sort and shuffle phase of MapReduce, keys and values are passed to reducers. Which of the following is true?

 

A.

Keys are presented to a reducer in sorted order; values for a given key are not sorted.

B.

Keys are presented to a reducer in soiled order; values for a given key are sorted in ascending order.

C.

Keys are presented to a reducer in random order; values for a given key are not sorted.

D.

Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.

 

Answer: D

 

 

QUESTION 4

In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

 

A.

Increase the parameter that controls minimum split size in the job configuration.

B.

Write a custom MapRunner that iterates over all key-value pairs in the entire file.

C.

Set the number of mappers equal to the number of input files you want to process.

D.

Write a custom FileInputFormat and override the method isSplittable to always return

 

 

 

 

 

Answer: D

Explanation: Note:

*// Do not allow splitting.

protected boolean isSplittable(JobContext context, Path filename) { return false;

}

*InputSplits: An InputSplit describes a unit of work that comprises a single map task in a MapReduce program. A MapReduce program applied to a data set, collectively referred to as a Job, is made up of several (possibly several hundred) tasks. Map tasks may involve reading a whole file; they often involve reading only part of a file. By default, the FileInputFormat and its descendants break a file up into 64 MB chunks (the same size as blocks in HDFS). You can control this value by setting the mapred.min.split.size parameter in hadoop-site.xml, or by overriding the parameter in the JobConf object used to submit a particular MapReduce job. By processing a file in chunks, we allow several map tasks to operate on a single file in parallel. If the file is very large, this can improve performance significantly through parallelism. Even more importantly, since the various blocks that make up the file may be spread across several different nodes in the cluster, it allows tasks to be scheduled on each of these different nodes; the individual blocks are thus all processed locally, instead of needing to be transferred from one node to another. Of course, while log files can be processed in this piece-wise fashion, some file formats are not amenable to chunked processing. By writing a custom InputFormat, you can control how the file is broken up (or is not broken up) into splits.

 

 

QUESTION 5

When is the reduce method first called in a MapReduce job?

 

A.

Reduce methods and map methods all start at the beginning of a job, in order to provide optimal performance for map-only or reduce-only jobs.

B.

Reducers start copying intermediate key value pairs from each Mapper as soon as it has completed. The reduce method is called as soon as the intermediate key-value pairs start to arrive.

C.

Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called only after all intermediate data has been copied and sorted.

D.

Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The programmer can configure in the job what percentage of the

 

 

 

 

intermediate data should arrive before the reduce method begins.

 

Answer: C

Explanation: In a MapReduce job reducers do not start executing the reduce method until the all Map jobs have completed. Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The programmer defined reduce method is called only after all the mappers have finished.

 

Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,When is the reducers are started in a MapReduce job?

 

http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html(question no.

17)

 

 

QUESTION 6

You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?

 

A.

Pig

B.

Hue

C.

Hive

D.

Sqoop

E.

Oozie

F.

Flume

G.

Hadoop Streaming

 

Answer: C

Explanation: Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL to query the data. At the same time, this language also allows programmers who are familiar with the MapReduce framework to be able to plug in their custom mappers and reducers to perform more sophisticated analysis that may not be supported by the built-in capabilities of the language. QL can also be extended with custom scalar functions (UDF’s), aggregations (UDAF’s), and table functions (UDTF’s).

 

Reference:https://cwiki.apache.org/Hive/(Apache Hive, first sentence and second paragraph)

 

 

 

 

 

 

QUESTION 7

How does the NameNode detect that a DataNode has failed?

 

A.

The NameNode does not need to know that a DataNode has failed.

B.

When the NameNode fails to receive periodic heartbeats from the DataNode, it considers the DataNode as failed.

C.

The NameNode periodically pings the datanode. If the DataNode does not respond, the NameNode considers the DataNode as failed.

D.

When HDFS starts up, the NameNode tries to communicate with the DataNode and considers the DataNode as failed if it does not respond.

 

Answer: B

Explanation: NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead. Since blocks will be under replicated the system begins replicating the blocks that were stored on the dead datanode. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode.

 

Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,How NameNode Handles data node failures?

 

 

QUESTION 8

The Hadoop framework provides a mechanism for coping with machine issues such as faulty configuration or impending hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts more copies of a map or reduce task. All the tasks run simultaneously and the task that finish first are used. This is called:

 

A.

Combiner

B.

IdentityMapper

 

 

 

 

C.

IdentityReducer

D.

Default Partitioner

E.

Speculative Execution

 

Answer: E

Explanation: Speculative execution: One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes.

By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.

 

Reference:Apache Hadoop,Module 4: MapReduce

 

 

QUESTION 9

Which of the following statements most accurately describes the relationship between MapReduce and Pig?

 

A.

Pig provides additional capabilities that allow certain types of data manipulation not possible with MapReduce.

B.

Pig provides no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.

C.

Pig programs rely on MapReduce but are extensible, allowing developers to do special- purpose processing not provided by MapReduce.

D.

Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs.

 

 

 

 

 

Answer: D

Explanation: In addition to providing many relational and data flow operators Pig Latin provides ways for you to control how your jobs execute on MapReduce. It allows you to set values that control your environment and to control details of MapReduce such as how your data is partitioned.

 

Reference:http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html(topic:

controlling execution)

 

 

QUESTION 10

For each input key-value pair, mappers can emit:

 

A.

One intermediate key value pair, of a different type.

B.

One intermediate key value pair, but of the same type.

C.

As many intermediate key-value pairs as desired, but they cannot be of the same type as the input key-value pair.

D.

As many intermediate key value pairs as desired, as long as all the keys have the same type and all the values have the same type.

E.

As many intermediate key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous).

 

Answer: E

Explanation: Mapper maps input key/value pairs to a set of intermediate key/value pairs.

 

Maps are the individual tasks that transform input records into intermediate records. The transformed intermediate records do not need to be of the same type as the input records. A giveninput pair may map to zero or many output pairs.

 

Reference: Hadoop Map-Reduce Tutorial

Free VCE & PDF File for Cloudera CCD-333 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …