Latest Certified Success Dumps Download

CCD-333 Examination questions (September)

Achieve New Updated (September) Cloudera CCD-333 Examination Questions 11-20

September 24, 2015




Can you use MapReduce to perform a relational join on two large tables sharing a key?





Assume that the two tables are formatted as comma-separated file in HDFS.





Yes, but only if one of the tables fits into memory.


Yes, so long as both tables fit into memory.


No, MapReduce cannot perform relational operations.


No, but it can be done with either Pig or Hive.


Answer: A

Explanation: Note:

* Join Algorithms in MapReduce

A) Reduce-side join

B) Map-side join

C) In-memory join

/ Striped Striped variant variant

/ Memcached variant


* Which join to use?

/ In-memory join > map-side join > reduce-side join / Limitations of each?

In-memory join: memory

Map-side join: sort order and partitioning

Reduce-side join: general purpose




You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use?

















Answer: E

Explanation: Use Apache HBase when you need random, realtime read/write access to






your Big Data.


Note:This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google’s Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.




Linear and modular scalability.

Strictly consistent reads and writes.

Automatic and configurable sharding of tables

Automatic failover support between RegionServers.

Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.

Easy to use Java API for client access.

Block cache and Bloom Filters for real-time queries.

Query predicate push down via server side Filters

Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options

Extensible jruby-based (JIRB) shell

Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX


Reference: would I use HBase? First sentence)




In the standard word count MapReduce algorithm, why might using a combiner reduce the overall Job running time?



Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.


Because combiners perform local aggregation of word counts, thereby reducing the number of mappers that need to run.


Because combiners perform local aggregation of word counts, and then transfer that





data to reducers without writing the intermediate data to disk.


Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.


Answer: A

Explanation:*Simply speaking a combiner can be considered as a”mini reducer”that will be applied potentially several times still during the map phase before to send the new (hopefully reduced) set of key/value pairs to the reducer(s). This is why a combiner must implement the Reducer interface (or extend the Reducer class as of hadoop 0.20).


*Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your MapReduce jobs should not depend on the combiners execution.


Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,What are combiners? When should I use a combiner in my MapReduce Job?




In a MapReduce job, the reducer receives all values associated with the same key. Which statement is most accurate about the ordering of these values?



The values are in sorted order.


The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.


The values are arbitrarily ordered, but multiple runs of the same MapReduce job will always have the same ordering.


Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.


Answer: D


*The Mapper outputs are sorted and then partitioned per Reducer.






*The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format.

*Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. *A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. *The MapReduce framework operates exclusively on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably of different types.


The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.


Reference:MapReduce Tutorial




Does the MapReduce programming model provide a way for reducers to communicate with each other?



Yes, all reducers can communicate with each other by passing information through the jobconf object.


Yes, reducers can communicate with each other by dispatching intermediate key value pairs that get shuffled to another reduce


Yes, reducers running on the same machine can communicate with each other through shared memory, but not reducers on different machines.


No, each reducer runs independently and in isolation.


Answer: D

Explanation: MapReduce programming model does not allow reducers to communicate with each other. Reducers run in isolation.

Reference:24 Interview Questions & Answers for Hadoop MapReduce developers question no. 9)








What happens in a MapReduce job when you set the number of reducers to zero?



No reducer executes, but the mappers generate no output.


No reducer executes, and the output of each mapper is written to a separate file in HDFS.


No reducer executes, but the outputs of all the mappers are gathered together and written to a single file in HDFS.


Setting the number of reducers to zero is invalid, and an exception is thrown.


Answer: B

Explanation: *It is legal to set the number of reduce-tasks to zero if no reduction is desired.


In this case the outputs of the map-tasks go directly to the FileSystem, into the output path set by setOutputPath(Path). The framework does not sort the map-outputs before writing them out to the FileSystem.

*Often, you may want to process input data using a map function only. To do this, simply set mapreduce.job.reduces to zero. The MapReduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.




Custom programmer-defined counters in MapReduce are:



Lightweight devices for bookkeeping within MapReduce programs.


Lightweight devices for ensuring the correctness of a MapReduce program. Mappers Increment counters, and reducers decrement counters. If at the end of the program the counters read zero, then you are sure that the job completed correctly.


Lightweight devices for synchronization within MapReduce programs. You can use counters to coordinate execution between a mapper and a reducer.


Answer: A

Explanation: Countersare a useful channel for gathering statistics about the job; for






quality-control, or for application-level statistics. They are also useful for problem diagnosis. Hadoop maintains somebuilt-in counters for every job, which reports various metrics for your job.


Hadoop MapReduce also allows the user to define a set of user-defined counters that can be incremented (or decremented by specifying a negative value as the parameter), by the driver, mapper or the reducer.


Reference:Iterative MapReduce and Counters,Introduction to Iterative MapReduce and Counters, second paragraph)




What is the preferred way to pass a small number of configuration parameters to a mapper or reducer?



As key-value pairs in the jobconf object.


As a custom input key-value pair passed to each mapper or reducer.


Using a plain text file via the Distributedcache, which each mapper or reducer reads.


Through a static variable in the MapReduce driver class (i.e., the class that submits the MapReduce job).


Answer: A

Explanation: In Hadoop, it is sometimes difficult to pass arguments to mappers and reducers. If the number of arguments is huge (e.g., big arrays), DistributedCache might be a good choice. However, here, we’re discussing small arguments, usually a hand of configuration parameters.


In fact, the way to configure these parameters is simple. When you initialize”JobConf”object to launch a mapreduce job, you can set the parameter by using”set”method like:


1JobConf job = (JobConf)getConf();

2job.set(“NumberOfDocuments”, args[0]);






Here,”NumberOfDocuments”is the name of parameter and its value is read from”args[0]”, a command line argument.


Reference:Passing Parameters and Arguments to Mapper and Reducer in Hadoop




You have a large dataset of key-value pairs, where the keys are strings, and the values are integers. For each unique key, you want to identify the largest integer. In writing a MapReduce program to accomplish this, can you take advantage of a combiner?



No, a combiner would not be useful in this case.




Yes, but the number of unique keys must be known in advance.


Yes, as long as all the keys fit into memory on each node.


Yes, as long as all the integer values that share the same key fit into memory on each node.


Answer: B




Which of the following statements best describes how a large (100 GB) file is stored in HDFS?



The file is divided into variable size blocks, which are stored on multiple data nodes.

Each block is replicated three times by default.


The file is replicated three times by default. Eachcopy of the file is stored on a separate datanodes.


The master copy of the file is stored on a single datanode. The replica copies are divided into fixed-size blocks, which are stored on multiple datanodes.


The file is divided into fixed-size blocks, which are stored on multiple datanodes. Each block is replicated three times by default. Multiple blocks from the same file might reside on the same datanode.


The file is divided into fixed-size blocks, which are stored on multiple datanodes. Each block is replicated three times by default.HDFS guarantees that different blocks from the same file are never on the same datanode.






Answer: E

Explanation: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.


Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,How the HDFS Blocks are replicated?

Free VCE & PDF File for Cloudera CCD-333 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …