Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCD-470 Examination questions (September)

Achieve New Updated (September) Cloudera CCD-470 Examination Questions 31-40

September 24, 2015

Ensurepass

 

 

QUESTION 31

Which process describes the lifecycle of a Mapper?

 

A.

The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally its close () method.

B.

The TaskTracker spawns a new Mapper to process all records in a single input split.

C.

The TaskTracker spawns a new Mapper to process each key-value pair.

D.

The JobTracker spawns a new Mapper to process all records in a single file.

 

Answer: C

Explanation: For each map instance that runs, the TaskTracker creates a new instance of your mapper.

Note:

* The Mapper is responsible for processing Key/Value pairs obtained from the InputFormat. The mapper may perform a number of Extraction and Transformation functions on the Key/Value pair before ultimately outputting none, one or many Key/Value pairs of the same, or different Key/Value type.

 

* With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defines an ‘Identity’ map function by default – every input Key/Value pair obtained from the InputFormat is written out.

Examining the run() method, we can see the lifecycle of the mapper:

/**

* Expert users can override this method for more complete control over the

* execution of the Mapper.

* @param context

* @throws IOException

*/

public void run(Context context) throws IOException, InterruptedException { setup(context);

while (context.nextKeyValue()) {

map(context.getCurrentKey(), context.getCurrentValue(), context); }

cleanup(context);

}

 

setup(Context) – Perform any setup for the mapper. The default implementation is a no-op method.

map(Key, Value, Context) – Perform a map operation in the given Key / Value pair. The default implementation calls Context.write(Key, Value)

 

 

 

 

 

cleanup(Context) – Perform any cleanup for the mapper. The default implementation is a no-op method.

 

Reference: Hadoop/MapReduce/Mapper

 

 

QUESTION 32

What is a Writable?

 

A.

Writable is an interface that all keys and values in MapReduce must implement. Classes implementing this interface must implement methods for serializing and deserializing themselves.

B.

Writable is an abstract class that all keys and values in MapReduce must extend. Classes extending this abstract base class must implement methods for serializing and deserializing themselves

C.

Writable is an interface that all keys, but not values, in MapReduce must implement. Classes implementing this interface must implement methods for serializing and deserializing themselves.

D.

Writable is an abstract class that all keys, but not values, in MapReduce must extend. Classes extending this abstract base class must implement methods for serializing and deserializing themselves.

 

Answer: A

Explanation: public interface Writable

A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.

 

Any key or value type in the Hadoop Map-Reduce framework implements this interface. Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.

 

Reference: org.apache.hadoop.io, Interface Writable

 

 

QUESTION 33

 

You want to understand more about how users browse your public website, such as which pages they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How will you gather this data for your analysis?

 

A.

Ingest the server web logs into HDFS using Flume.

B.

Write a MapReduce job, with the web servers for mappers, and the Hadoop cluster nodes for reduces.

C.

Import all users’ clicks from your OLTP databases into Hadoop, using Sqoop.

D.

Channel these clickstreams inot Hadoop using Hadoop Streaming.

E.

Sample the weblogs from the web servers, copying them into Hadoop using curl.

 

Answer: B

Explanation: Hadoop MapReduce for Parsing Weblogs

 

Here are the steps for parsing a log file using Hadoop MapReduce:

 

Load log files into the HDFS location using this Hadoop command:

 

hadoop fs -put <local file path of weblogs> <hadoop HDFS location> The Opencsv2.3.jar framework is used for parsing log records.

 

Below is the Mapper program for parsing the log file from the HDFS location.

 

public static class ParseMapper

extends Mapper<Object, Text, NullWritable,Text >{

 

private Text word = new Text();

 

public void map(Object key, Text value, Context context ) throws IOException, InterruptedException {

CSVParser parse = new CSVParser(‘ ‘,’\”‘);

String sp[]=parse.parseLine(value.toString());

int spSize=sp.length;

StringBuffer rec= new StringBuffer();

for(int i=0;i<spSize;i++){

rec.append(sp[i]);

if(i!=(spSize-1))

rec.append(“,”);

}

word.set(rec.toString());

 

 

 

 

 

context.write(NullWritable.get(), word);

}

}

The command below is the Hadoop-based log parse execution. TheMapReduce program is attached in this article. You can add extra parsing methods in the class. Be sure to create a new JAR with any change and move it to the Hadoop distributed job tracker system.

 

hadoop jar <path of logparse jar> <hadoop HDFS logfile path> <output path of parsed log file>

The output file is stored in the HDFS location, and the output file name starts with “part-“.

 

 

QUESTION 34

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

 

A.

Increase the parameter that controls minimum split size in the job configuration.

B.

Write a custom MapRunner that iterates over all key-value pairs in the entire file.

C.

Set the number of mappers equal to the number of input files you want to process.

D.

Write a custom FileInputFormat and override the method isSplitable to always return false.

 

Answer: D

Explanation: FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Subclasses of FileInputFormat can also override the isSplitable(JobContext, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers.

 

Reference: org.apache.hadoop.mapreduce.lib.input, Class FileInputFormat<K,V>

 

 

QUESTION 35

What types of algorithms are difficult to express in MapReduce v1 (MRv1)?

 

 

 

 

 

A.

Algorithms that require applying the same mathematical function to large numbers of individual binary records.

B.

Relational operations on large amounts of structured and semi-structured data.

C.

Algorithms that require global, sharing states.

D.

Large-scale graph algorithms that require one-step link traversal.

E.

Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).

 

Answer: C

Explanation: See 3) below.

Limitations of Mapreduce ?where not to use Mapreduce

 

While very powerful and applicable to a wide variety of problems, MapReduce is not the answer to every problem. Here are some problems I found where MapReudce is not suited and some papers that address the limitations of MapReuce.

 

1. Computation depends on previously computed values If the computation of a value depends on previously computed values, then MapReduce cannot be used. One good example is the Fibonacci series where each value is summation of the previous two values. i.e., f(k+2) = f(k+1) + f(k). Also, if the data set is small enough to be computed on a single machine, then it is better to do it as a single reduce(map(data)) operation rather than going through the entire map reduce process.

 

2. Full-text indexing or ad hoc searching

The index generated in the Map step is one dimensional, and the Reduce step must not generate a large amount of data or there will be a serious performance degradation. For example, CouchDB’s MapReduce may not be a good fit for full-text indexing or ad hoc searching. This is a problem better suited for a tool such as Lucene.

 

3. Algorithms depend on shared global state

Solutions to many interesting problems in text processing do not require global synchronization. As a result, they can be expressed naturally in MapReduce, since map and reduce tasks run independently and in isolation. However, there are many examples of algorithms that depend crucially on the existence of shared global state during processing, making them difficult to implement in MapReduce (since the single opportunity for global synchronization in MapReduce is the barrier between the map and reduce phases of processing)

 

Reference: Limitations of Mapreduce ?where not to use Mapreduce

 

 

 

 

 

 

QUESTION 36

In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?

 

A.

The values are in sorted order.

B.

The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.

C.

The values are arbitrary ordered, but multiple runs of the same MapReduce job will always have the same ordering.

D.

Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.

 

Answer: B

Explanation:Note:

*Input to the Reducer is the sorted output of the mappers.

* The framework calls the application’s Reduce function once for each unique key in the sorted order.

* Example:

For the given sample input the first map emits:

< Hello, 1>

< World, 1>

< Bye, 1>

< World, 1>

The second map emits:

< Hello, 1>

< Hadoop, 1>

< Goodbye, 1>

< Hadoop, 1>

 

 

QUESTION 37

You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it’s given a path object representing this directory?

 

 

 

 

 

A.

Four, all files will be processed

B.

Three, the pound sign is an invalid character for HDFS file names

C.

Two, file names with a leading period or underscore are ignored

D.

None, the directory cannot be named jobdata

E.

One, no special characters can prefix the name of an input file

 

Answer: C

Explanation: Files starting with ‘_’ are considered ‘hidden’ like unix files starting with ‘.’.

 

# characters are allowed in HDFS file names.

 

 

QUESTION 38

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

 

A.

When the types of the reduce operation’s input key and input value match the types of the reducer’s output key and output value and when the reduce operation is both communicative and associative.

B.

When the signature of the reduce method matches the signature of the combine method.

C.

Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.

D.

Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.

E.

Never. Combiners and reducers must be implemented separately because they serve different purposes.

 

Answer: A

Explanation: You can use your reducer code as a combiner if the operation performed is commutative and associative.

 

 

QUESTION 39

 

In a large MapReduce job with m mappers and r reducers, how many distinct copy operations will there be in the sort/shuffle phase?

 

A.

m

B.

r

C.

m+r (i.e., m plus r)

D.

mxr (i.e., m multiplied by r)

E.

mr (i.e., m to the power of r)

 

Answer: D

Explanation: A MapReduce job with m mappers and r reducers involves up to m * r distinct copy operations, since each mapper may have intermediate output going to every reducer.

 

 

QUESTION 40

For each intermediate key, each reducer task can emit:

 

A.

As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous).

B.

As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs.

C.

As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type.

D.

One final key-value pair per value associated with the key; no restrictions on the type.

E.

One final key-value pair per key; no restrictions on the type.

 

Answer: E

Explanation: Reducer reduces a set of intermediate values which share a key to a smaller set of values.

 

Reducing lets you aggregate values together. A reducer function receives an iterator of input values from an input list. It then combines these values together, returning a single output value.

 

Reference: Hadoop Map-Reduce Tutorial; Yahoo! Hadoop Tutorial, Module 4: MapReduce

 

Free VCE & PDF File for Cloudera CCD-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …

 >=”cursor: auto; margin: 0cm 0cm 0pt; line-height: normal; text-autospace: ; mso-layout-grid-align: none” align=”left”> 

 

QUESTION 31

Which process describes the lifecycle of a Mapper?

 

A.

The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally its close () method.

B.

The TaskTracker spawns a new Mapper to process all records in a single input split.

C.

The TaskTracker spawns a new Mapper to process each key-value pair.

D.

The JobTracker spawns a new Mapper to process all records in a single file.

 

Answer: C

Explanation: For each map instance that runs, the TaskTracker creates a new instance of your mapper.

Note:

* The Mapper is responsible for processing Key/Value pairs obtained from the InputFormat. The mapper may perform a number of Extraction and Transformation functions on the Key/Value pair before ultimately outputting none, one or many Key/Value pairs of the same, or different Key/Value type.

 

* With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defines an ‘Identity’ map function by default – every input Key/Value pair obtained from the InputFormat is written out.

Examining the run() method, we can see the lifecycle of the mapper:

/**

* Expert users can override this method for more complete control over the

* execution of the Mapper.

* @param context

* @throws IOException

*/

public void run(Context context) throws IOException, InterruptedException { setup(context);

while (context.nextKeyValue()) {

map(context.getCurrentKey(), context.getCurrentValue(), context); }

cleanup(context);

}

 

setup(Context) – Perform any setup for the mapper. The default implementation is a no-op method.

map(Key, Value, Context) – Perform a map operation in the given Key / Value pair. The default implementation calls Context.write(Key, Value)

 

 

 

 

 

cleanup(Context) – Perform any cleanup for the mapper. The default implementation is a no-op method.

 

Reference: Hadoop/MapReduce/Mapper

 

 

QUESTION 32

What is a Writable?

 

A.

Writable is an interface that all keys and values in MapReduce must implement. Classes implementing this interface must implement methods for serializing and deserializing themselves.

B.

Writable is an abstract class that all keys and values in MapReduce must extend. Classes extending this abstract base class must implement methods for serializing and deserializing themselves

C.

Writable is an interface that all keys, but not values, in MapReduce must implement. Classes implementing this interface must implement methods for serializing and deserializing themselves.

D.

Writable is an abstract class that all keys, but not values, in MapReduce must extend. Classes extending this abstract base class must implement methods for serializing and deserializing themselves.

 

Answer: A

Explanation: public interface Writable

A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.

 

Any key or value type in the Hadoop Map-Reduce framework implements this interface. Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.

 

Reference: org.apache.hadoop.io, Interface Writable

 

 

QUESTION 33

 

You want to understand more about how users browse your public website, such as which pages they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How will you gather this data for your analysis?

 

A.

Ingest the server web logs into HDFS using Flume.

B.

Write a MapReduce job, with the web servers for mappers, and the Hadoop cluster nodes for reduces.

C.

Import all users’ clicks from your OLTP databases into Hadoop, using Sqoop.

D.

Channel these clickstreams inot Hadoop using Hadoop Streaming.

E.

Sample the weblogs from the web servers, copying them into Hadoop using curl.

 

Answer: B

Explanation: Hadoop MapReduce for Parsing Weblogs

 

Here are the steps for parsing a log file using Hadoop MapReduce:

 

Load log files into the HDFS location using this Hadoop command:

 

hadoop fs -put <local file path of weblogs> <hadoop HDFS location> The Opencsv2.3.jar framework is used for parsing log records.

 

Below is the Mapper program for parsing the log file from the HDFS location.

 

public static class ParseMapper

extends Mapper<Object, Text, NullWritable,Text >{

 

private Text word = new Text();

 

public void map(Object key, Text value, Context context ) throws IOException, InterruptedException {

CSVParser parse = new CSVParser(‘ ‘,’\”‘);

String sp[]=parse.parseLine(value.toString());

int spSize=sp.length;

StringBuffer rec= new StringBuffer();

for(int i=0;i<spSize;i++){

rec.append(sp[i]);

if(i!=(spSize-1))

rec.append(“,”);

}

word.set(rec.toString());

 

 

 

 

 

context.write(NullWritable.get(), word);

}

}

The command below is the Hadoop-based log parse execution. TheMapReduce program is attached in this article. You can add extra parsing methods in the class. Be sure to create a new JAR with any change and move it to the Hadoop distributed job tracker system.

 

hadoop jar <path of logparse jar> <hadoop HDFS logfile path> <output path of parsed log file>

The output file is stored in the HDFS location, and the output file name starts with “part-“.

 

 

QUESTION 34

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

 

A.

Increase the parameter that controls minimum split size in the job configuration.

B.

Write a custom MapRunner that iterates over all key-value pairs in the entire file.

C.

Set the number of mappers equal to the number of input files you want to process.

D.

Write a custom FileInputFormat and override the method isSplitable to always return false.

 

Answer: D

Explanation: FileInputFormat is the base class for all file-based InputFormats. This provides a generic implementation of getSplits(JobContext). Subclasses of FileInputFormat can also override the isSplitable(JobContext, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers.

 

Reference: org.apache.hadoop.mapreduce.lib.input, Class FileInputFormat<K,V>

 

 

QUESTION 35

What types of algorithms are difficult to express in MapReduce v1 (MRv1)?

 

 

 

 

 

A.

Algorithms that require applying the same mathematical function to large numbers of individual binary records.

B.

Relational operations on large amounts of structured and semi-structured data.

C.

Algorithms that require global, sharing states.

D.

Large-scale graph algorithms that require one-step link traversal.

E.

Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).

 

Answer: C

Explanation: See 3) below.

Limitations of Mapreduce ?where not to use Mapreduce

 

While very powerful and applicable to a wide variety of problems, MapReduce is not the answer to every problem. Here are some problems I found where MapReudce is not suited and some papers that address the limitations of MapReuce.

 

1. Computation depends on previously computed values If the computation of a value depends on previously computed values, then MapReduce cannot be used. One good example is the Fibonacci series where each value is summation of the previous two values. i.e., f(k+2) = f(k+1) + f(k). Also, if the data set is small enough to be computed on a single machine, then it is better to do it as a single reduce(map(data)) operation rather than going through the entire map reduce process.

 

2. Full-text indexing or ad hoc searching

The index generated in the Map step is one dimensional, and the Reduce step must not generate a large amount of data or there will be a serious performance degradation. For example, CouchDB’s MapReduce may not be a good fit for full-text indexing or ad hoc searching. This is a problem better suited for a tool such as Lucene.

 

3. Algorithms depend on shared global state

Solutions to many interesting problems in text processing do not require global synchronization. As a result, they can be expressed naturally in MapReduce, since map and reduce tasks run independently and in isolation. However, there are many examples of algorithms that depend crucially on the existence of shared global state during processing, making them difficult to implement in MapReduce (since the single opportunity for global synchronization in MapReduce is the barrier between the map and reduce phases of processing)

 

Reference: Limitations of Mapreduce ?where not to use Mapreduce

 

 

 

 

 

 

QUESTION 36

In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?

 

A.

The values are in sorted order.

B.

The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.

C.

The values are arbitrary ordered, but multiple runs of the same MapReduce job will always have the same ordering.

D.

Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.

 

Answer: B

Explanation:Note:

*Input to the Reducer is the sorted output of the mappers.

* The framework calls the application’s Reduce function once for each unique key in the sorted order.

* Example:

For the given sample input the first map emits:

< Hello, 1>

< World, 1>

< Bye, 1>

< World, 1>

The second map emits:

< Hello, 1>

< Hadoop, 1>

< Goodbye, 1>

< Hadoop, 1>

 

 

QUESTION 37

You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it’s given a path object representing this directory?

 

 

 

 

 

A.

Four, all files will be processed

B.

Three, the pound sign is an invalid character for HDFS file names

C.

Two, file names with a leading period or underscore are ignored

D.

None, the directory cannot be named jobdata

E.

One, no special characters can prefix the name of an input file

 

Answer: C

Explanation: Files starting with ‘_’ are considered ‘hidden’ like unix files starting with ‘.’.

 

# characters are allowed in HDFS file names.

 

 

QUESTION 38

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

 

A.

When the types of the reduce operation’s input key and input value match the types of the reducer’s output key and output value and when the reduce operation is both communicative and associative.

B.

When the signature of the reduce method matches the signature of the combine method.

C.

Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.

D.

Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.

E.

Never. Combiners and reducers must be implemented separately because they serve different purposes.

 

Answer: A

Explanation: You can use your reducer code as a combiner if the operation performed is commutative and associative.

 

 

QUESTION 39

 

In a large MapReduce job with m mappers and r reducers, how many distinct copy operations will there be in the sort/shuffle phase?

 

A.

m

B.

r

C.

m+r (i.e., m plus r)

D.

mxr (i.e., m multiplied by r)

E.

mr (i.e., m to the power of r)

 

Answer: D

Explanation: A MapReduce job with m mappers and r reducers involves up to m * r distinct copy operations, since each mapper may have intermediate output going to every reducer.

 

 

QUESTION 40

For each intermediate key, each reducer task can emit:

 

A.

As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous).

B.

As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs.

C.

As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type.

D.

One final key-value pair per value associated with the key; no restrictions on the type.

E.

One final key-value pair per key; no restrictions on the type.

 

Answer: E

Explanation: Reducer reduces a set of intermediate values which share a key to a smaller set of values.

 

Reducing lets you aggregate values together. A reducer function receives an iterator of input values from an input list. It then combines these values together, returning a single output value.

 

Reference: Hadoop Map-Reduce Tutorial; Yahoo! Hadoop Tutorial, Module 4: MapReduce

 

Free VCE & PDF File for Cloudera CCD-470 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …