Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCD-470 Examination questions (September)

Achieve New Updated (September) Cloudera CCD-470 Examination Questions 11-20

September 24, 2015

Ensurepass

 

QUESTION 11

You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high- level programming language like Python. Which format should you use to store this data in HDFS?

 

 

 

 

 

A.

SequenceFiles

B.

Avro

C.

JSON

D.

HTML

E.

XML

F.

CSV

 

Answer: A

Explanation: Using Hadoop Sequence Files

 

So what should we do in order to deal with huge amount of images? Use hadoop sequence files! Those are map files that inherently can be read by map reduce applications ?there is an input format especially for sequence files ?and are splitable by map reduce, so we can have one huge file that will be the input of many map tasks. By using those sequence files we are letting hadoop use its advantages. It can split the work into chunks so the processing is parallel, but the chunks are big enough that the process stays efficient.

 

Since the sequence file are map file the desired format will be that the key will be text and hold the HDFS filename and the value will be BytesWritable and will contain the image content of the file.

 

Reference: Hadoop binary files processing introduced by image duplicates finder

 

 

QUESTION 12

You want to populate an associative array in order to perform a map-side join. You’ve decided to put this information in a text file, place that file into the DistributedCache and read it in your Mapper before any records are processed.

 

Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array?

 

A.

combine

B.

map

C.

init

D.

configure

 

Answer: D

Explanation: See 3) below.

 

 

 

 

 

Here is an illustrative example on how to use the DistributedCache:

// Setting up the cache for the application

 

1. Copy the requisite files to the FileSystem:

 

$ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar $ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar $ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz $ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz

 

2. Setup the application’s JobConf:

 

JobConf job = new JobConf();

DistributedCache.addCacheFile(new URI(“/myapp/lookup.dat#lookup.dat”), job);

DistributedCache.addCacheArchive(new URI(“/myapp/map.zip”, job); DistributedCache.addFileToClassPath(new Path(“/myapp/mylib.jar”), job); DistributedCache.addCacheArchive(new URI(“/myapp/mytar.tar”, job); DistributedCache.addCacheArchive(new URI(“/myapp/mytgz.tgz”, job); DistributedCache.addCacheArchive(new URI(“/myapp/mytargz.tar.gz”, job);

 

3. Use the cached files in theMapper

orReducer:

 

public static class MapClass extends MapReduceBase implements Mapper<K, V, K, V> {

 

private Path[] localArchives;

private Path[] localFiles;

 

public void configure(JobConf job) {

// Get the cached archives/files

localArchives = DistributedCache.getLocalCacheArchives(job); localFiles = DistributedCache.getLocalCacheFiles(job); }

 

public void map(K key, V value,

 

 

 

 

 

OutputCollector<K, V> output, Reporter reporter)

throws IOException {

// Use data from the cached archives/files here

// …

// …

output.collect(k, v);

}

}

 

Reference: org.apache.hadoop.filecache , Class DistributedCache

 

 

QUESTION 13

You need to create a GUI application to help your company’s sales people add and edit customer information. Would HDFS be appropriate for this customer information file?

 

A.

Yes, because HDFS is optimized for random access writes.

B.

Yes, because HDFS is optimized for fast retrieval of relatively small amounts of data.

C.

No, because HDFS can only be accessed by MapReduce applications.

D.

No, because HDFS is optimized for write-once, streaming access for relatively large files.

 

Answer: D

Explanation: HDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is HDFS ? How it is different from traditional file systems?

 

 

QUESTION 14

 

Which of the following best describes the workings of TextInputFormat?

 

A.

Input file splits may cross line breaks. A line that crosses tile splits is ignored.

B.

The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines.

C.

Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line.

D.

Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.

E.

Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.

 

Answer: D

Explanation: As the Map operation is parallelized the input file set is first split to several pieces called FileSplits. If an individual file is so large that it will affect seek time it will be split to several Splits. The splitting does not know anything about the input file’s internal logical structure, forexample line-oriented text files are split on arbitrary byte boundaries.

Then a new map task is created per FileSplit.

 

When an individual map task starts it will open a new output writer per configured reduce task. It will then proceed to read its FileSplit using the RecordReader it gets from the specified InputFormat. InputFormat parses the input and generates key-value pairs. InputFormat must also handle records that may be split on the FileSplit boundary. For example TextInputFormat will read the last line of the FileSplit past the split boundary and, when reading other than the first FileSplit, TextInputFormat ignores the content up to the first newline.

 

Reference: How Map and Reduce operations are actually carried out

 

http://wiki.apache.org/hadoop/HadoopMapReduce(Map, second paragraph)

 

 

QUESTION 15

During the standard sort and shuffle phase of MapReduce, keys and values are passed to reducers. Which of the following is true?

 

A.

Keys are presented to a reducer in sorted order; values for a given key are not sorted.

B.

Keys are presented to a reducer in soiled order; values for a given key are sorted in

 

 

 

 

ascending order.

C.

Keys are presented to a reducer in random order; values for a given key are not sorted.

D.

Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.

 

Answer: D

 

 

QUESTION 16

What is the behavior of the default partitioner?

 

A.

The default partitioner assigns key value pairs to reducers based on an internal random number generator.

B.

The default partitioner implements a round robin strategy, shuffling the key value pairs to each reducer in turn. This ensures an even partition of the key space.

C.

The default partitioner computes the hash of the key. Hash values between specific ranges are associated with different buckets, and each bucket is assigned to a specific reducer.

D.

The default partitioner computes the hash of the key and divides that value modulo the number of reducers. The result determines the reducer assigned to process the key-value pair.

E.

The default partitioner computes the hash of the value and takes the mod of that value with the number of reducers. The result determines the reducer assigned to process the key value pair.

 

Answer: D

Explanation: The default partitioner computes a hash value for the key and assigns the partition based on this result.

 

The default Partitioner implementation is called HashPartitioner. It uses the hashCode() method of the key objects modulo the number of partitions total to determine which partition to send a given (key, value) pair to.

 

In Hadoop, the default partitioner is HashPartitioner, which hashes a record’s key to determine which partition (and thus which reducer) the record belongs in.The number of partition is thenequal to the number of reduce tasks for the job.

 

Reference: Getting Started With (Customized) Partitioning

 

 

 

 

 

 

QUESTION 17

How does the NameNode detect that a DataNode has failed?

 

A.

The NameNode does not need to know that a DataNode has failed.

B.

When the NameNode fails to receive periodic heartbeats from the DataNode, it considers the DataNode as failed.

C.

The NameNode periodically pings the datanode. If the DataNode does not respond, the NameNode considers the DataNode as failed.

D.

When HDFS starts up, the NameNode tries to communicate with the DataNode and considers the DataNode as failed if it does not respond.

 

Answer: B

Explanation: NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. When NameNode notices that it has not recieved a hearbeat message from a data node after a certain amount of time, the data node is marked as dead. Since blocks will be under replicated the system begins replicating the blocks that were stored on the dead datanode. The NameNode Orchestrates the replication of data blocks from one datanode to another. The replication data transfer happens directly between datanodes and the data never passes through the namenode.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, How NameNode Handles data node failures?

 

 

QUESTION 18

Workflows expressed in Oozie can contain:

 

A.

Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.

B.

Sequences of MapReduce job only; on Pig on Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.

C.

Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.

D.

Iterntive repetition of MapReduce jobs until a desired answer or state is reached.

 

 

 

 

 

Answer: A

Explanation: Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph), specifying a sequence of actions execution. This graph is specified in hPDL (a XML Process Definition Language).

 

hPDL is a fairly compact language, using a limited amount of flow control and action nodes. Control nodes define the flow of execution and include beginning and end of a workflow (start, end and fail nodes) and mechanisms to control the workflow execution path ( decision, fork and join nodes).

 

Workflow definitions

Currently running workflow instances, including instance states and variables

 

Reference: Introduction to Oozie

 

Note: Oozie is a Java Web-Application that runs in a Java servlet-container – Tomcat and uses a database to store:

 

 

QUESTION 19

Combiners Increase the efficiency of a MapReduce program because:

 

A.

They provide a mechanism for different mappers to communicate with each Other, thereby reducing synchronization overhead.

B.

They provide an optimization and reduce the total number of computations that are needed to execute an algorithm by a factor of n, where is the number of reducer.

C.

They aggregate intermediate map output locally on each individual machine and therefore reduce the amount of data that needs to be shuffled across the network to the reducers.

D.

They aggregate intermediate map output horn a small number of nearby (i.e., rack-local) machines and therefore reduce the amount of data that needs to be shuffled across the network to the reducers.

 

Answer: C

Explanation: Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to

 

 

 

 

 

the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. Theexecution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your MapReduce jobs should not depend on the combiners execution.

 

http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html(question no.

12)

 

 

QUESTION 20

Which happens if the NameNode crashes?

 

A.

HDFS becomes unavailable until the NameNode is restored.

B.

The Secondary NameNode seamlessly takes over and there is no service interruption.

C.

HDFS becomes unavailable to new MapReduce jobs, but running jobs will continue until completion.

D.

HDFS becomes temporarily unavailable until an administrator starts redirecting client requests to the Secondary NameNode.

 

Answer: A

Explanation: The NameNode is a Single Point of Failure for the HDFS Cluster. When the NameNode goes down, the file system goes offline.

 

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What is a NameNode? How many instances of NameNode run on a Hadoop Cluster?

Free VCE & PDF File for Cloudera CCD-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …