Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCD-333 Examination questions (September)

Achieve New Updated (September) Cloudera CCD-333 Examination Questions 31-40

September 24, 2015

Ensurepass

 

QUESTION 31

MapReduce is well-suited for all of the following applications EXCEPT? (Choose one):

 

A.

Text mining on a large collections of unstructured documents.

B.

Analysis of large amounts of Web logs (queries, clicks, etc.).

C.

Online transaction processing (OLTP) for an e-commerce Website.

D.

Graph mining on a large social network (e.g., Facebook friends network).

 

Answer: C

Explanation: Hadoop Map/Reduce is designed for batch-oriented work load. MapReduce is well suited for data warehousing (OLAP), but not for OLTP.

 

 

QUESTION 32

You use the hadoop fs ut command to write a 300 MB file using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this file?

 

A.

They would see no content until the whole file is written and closed.

B.

They would see the content of the file through the last completed block.

C.

They would see the current state of the file, up to the last bit written by the command.

D.

They would see Hadoop throw an concurrentFileAccessException when they try to access this file.

 

 

 

 

 

Answer: A

Explanation:Note:

*put

Usage: hadoop fs -put <localsrc> … <dst>

 

Copy single src, or multiple srcs from local file system to the destination filesystem. Also reads input from stdin and writes to destination filesystem.

 

 

QUESTION 33

Which MapReduce daemon runs on each slave node and participates in job execution?

 

A.

TaskTracker

B.

JobTracker

C.

NameNode

D.

Secondary NameNode

 

Answer: A

Explanation: Single instance of a Task Tracker is run on each Slave node. Task tracker is run as a separate JVM process.

 

Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,What is configuration of a typical slave node on Hadoop cluster? How many JVMs run on a slave node?

 

http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html(See answer to question no. 5)

 

 

QUESTION 34

You are running a job that will process a single InputSplit on a cluster which has no other jobs currently running. Each node has an equal number of open Map slots. On which node will Hadoop first attempt to run the Map task?

 

A.

The node with the most memory

 

 

 

 

B.

The node with the lowest system load

C.

The node on which this InputSplit is stored

D.

The node with the most free local disk space

 

Answer: C

Explanation: The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.

 

 

QUESTION 35

The NameNode uses RAM for the following purpose:

 

A.

To store the contents of files in HDFS.

B.

To store filenames, list of blocks and other meta information.

C.

To store the edits log that keeps track of changes in HDFS.

D.

To manage distributed read and write locks on files in HDFS.

 

Answer: B

Explanation: The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. There is only One NameNode process run on any hadoop cluster. NameNode runs on its own JVM process. In a typical production cluster its run on a separate machine. The NameNode is a Single Point of Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives

 

Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,What is a NameNode? How many instances of NameNode run on a Hadoop Cluster?

 

 

 

 

 

 

QUESTION 36

Which of the Following best describes the lifecycle of a Mapper?

 

A.

The TaskTracker spawns a new Mapper to process each key-value pair.

B.

The JobTracker spawns a new Mapper to process all records in a single file.

C.

The TaskTracker spawns a new Mapper to process all records in a single input split.

D.

The JobTracker calls the FastTracker’s configure () method, then its map () method and finally its closer ()

 

Answer: A

Explanation: For each map instance that runs, the TaskTracker creates a new instance of your mapper.

 

Note:

*The Mapper is responsible for processing Key/Value pairs obtained from the InputFormat. The mapper may perform a number of Extraction and Transformation functions on the Key/Value pair before ultimately outputting none, one or many Key/Value pairs of the same, or different Key/Value type.

 

*With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defines an ‘Identity’ map function by default – every input Key/Value pair obtained from the InputFormat is written out.

Examining the run() method, we can see the lifecycle of the mapper:

/**

* Expert users can override this method for more complete control over the

* execution of the Mapper.

* @param context

* @throws IOException

*/

public void run(Context context) throws IOException, InterruptedException { setup(context);

while (context.nextKeyValue()) {

map(context.getCurrentKey(), context.getCurrentValue(), context); }

cleanup(context);

}

 

 

 

 

 

setup(Context) – Perform any setup for the mapper. The default implementation is a no-op method.

map(Key, Value, Context) – Perform a map operation in the given Key / Value pair. The default implementation calls Context.write(Key, Value) cleanup(Context) – Perform any cleanup for the mapper. The default implementation is a no-op method.

 

Reference:Hadoop/MapReduce/Mapper

 

 

QUESTION 37

To process input key-value pairs, your mapper needs to load a 512 MB data file in memory.

What is the best way to accomplish this?

 

A.

Place the datafile in the DataCache and read the data into memory in the configure method ofthe mapper.

B.

Place the data file in theDistributedCacheand read the data into memory in the map method of the mapper.

C.

Place the data file in theDistributedCacheand read the data into memory in the configure method of the mapper.

D.

Serialize the data file, insert it in the Jobconf object, and read the data into memory in the configure method of the mapper.

 

Answer: B

Explanation: Hadoop has a distributed cache mechanism to make available file locally that may be needed by Map/Reduce jobs

 

Use Case

 

Lets understand our Use Case a bit more in details so that we can follow-up the code snippets.

We have a Key-Value file that we need to use in our Map jobs. For simplicity, lets say we need to replace all keywords that we encounter during parsing, with some other value.

 

So what we need is

 

A key-values files (Lets use a Properties files)

The Mapper code that uses the code

 

 

 

 

 

Write the Mapper code that uses it

 

view sourceprint?

01.

public class DistributedCacheMapper extends Mapper<LongWritable, Text, Text, Text> { 02.

 

03.

Properties cache;

04.

 

05.

@Override

06.

protected void setup(Context context) throws IOException, InterruptedException { 07.

super.setup(context);

08.

Path[] localCacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); 09.

 

10.

if(localCacheFiles != null) {

11.

// expecting only single file here

12.

for (int i = 0; i < localCacheFiles.length; i++) { 13.

Path localCacheFile = localCacheFiles[i];

14.

cache = new Properties();

15.

cache.load(new FileReader(localCacheFile.toString())); 16.

}

17.

} else {

18.

// do your error handling here

 

 

 

 

 

19.

}

20.

 

21.

}

22.

 

23.

@Override

24.

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

25.

// use the cache here

26.

// if value contains some attribute, cache.get(<value>) 27.

// do some action or replace with something else

28.

}

29.

 

30.

}

 

Note:

* Distribute application-specific large, read-only files efficiently.

 

DistributedCache is a facility provided by the Map-Reduce framework to cache files (text, archives, jars etc.) needed by applications.

 

Applications specify the files, via urls (hdfs:// or http://) to be cached via the JobConf. The DistributedCache assumes that the files specified via hdfs:// urls are already present on the FileSystem at the path specified by the url.

 

Reference:Using Hadoop Distributed Cache

 

 

 

 

 

 

QUESTION 38

Which of the following describes how a client reads a file from HDFS?

 

A.

The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directly off the DataNode(s).

B.

The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.

C.

The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.

D.

The client contacts the NameNode for the block location(s). The NameNode contacts theDataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client.

 

Answer: C

Explanation: The Client communication to HDFS happens using Hadoop HDFS API. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file on HDFS. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers,How the Client communicates with HDFS?

 

 

QUESTION 39

Combiners Increase the efficiency of a MapReduce program because:

 

A.

They provide a mechanism for different mappers to communicate with each Other, thereby reducing synchronization overhead.

B.

They provide an optimization and reduce the total number of computations that are needed to execute an algorithm by a factor of n, where is the number of reducer.

C.

They aggregate intermediate map output locally on each individual machine and

 

 

 

 

therefore reduce the amount of data that needs to be shuffled across the network to the reducers.

D.

They aggregate intermediate map output horn a small number of nearby (i.e., rack-local) machines and therefore reduce the amount of data that needs to be shuffled across the network to the reducers.

 

Answer: C

Explanation:Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your MapReduce jobs should not depend on the combiners execution.

 

Reference:24 Interview Questions & Answers for Hadoop MapReduce developers,What are combiners? When should I use a combiner in my MapReduce Job?

 

http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html(question no.

12)

 

 

QUESTION 40

You’ve written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?

 

A.

Writable

B.

WritableComparable

C.

InputFormat

D.

OutputFormat

E.

Combiner

F.

Partitioner

 

Answer: E

 

 

Explanation: Users can optionally specify a combiner, via JobConf.setCombinerClass(Class), to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.

 

Reference:Map/Reduce Tutorial

 

http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html(Mapper, 9th paragraph)

 

Free VCE & PDF File for Cloudera CCD-333 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …