Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
DS-200 Examination questions (September)

Achieve New Updated (September) Cloudera DS-200 Examination Questions 21-30

September 24, 2015

Ensurepass

 

QUESTION 21

Given the following sample of numbers from a distribution:

 

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

 

What are the five numbers that summarizethis distribution(the fivenumbersummary of samplepercentiles)?

 

A.

1, 3, 8, 34, 89

B.

1, 4, 13, 34, 89

C.

1, 1.5, 5, 24.5, 89

D.

1, 2.5, 8, 27.5, 89

 

Answer: A

 

 

QUESTION 22

 

Which two machinelearning algorithmshould you consideras likely to benefitfrom discretizing continuousfeatures?

 

A.

Support vector machine

B.

Nae Bayes

C.

Decision trees

D.

Logistic regression

E.

Singular value decomposition

 

Answer: AB

Reference:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656082/

 

 

QUESTION 23

Given the following sample of numbers from a distribution:

 

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

 

How do high-level languageslike Apache Hive and Apache Pigefficiently calculateapproximatelypercentiles for a distribution?

 

A.

They sort all of the input samples and the lookup the samples for each percentile

B.

They maintain index of input data as it is loaded into HDFS and load them into memory

C.

They use pivots to assign each observations to the reducer that calculate each percentile

D.

They assign sample observations to buckets and then aggregate the buckets to compute the approximations

 

Answer: C

 

 

QUESTION 24

You are about tosamplea 100-dimensinalunit-cube. To adequatelysample any single givendimension, youneed onlycapture 10 points. Howmany pointsdo you need to orderto sample the complete100-dimensionalunitcube adequately?

 

 

 

 

 

A.

10010

B.

1010

C.

Log2(100)

D.

100

E.

1000

F.

1010

 

Answer: E

 

 

QUESTION 25

Under what two conditions doesstochasticgradientdescentoutperform2nd-order optimizationtechniques such asiterativelyreweightedleast squares?

 

A.

When the volume of input data is so large and diverse that a 2nd-order optimization technique can be fit to a sample of the data

B.

When the model’s estimates must be updated in real-time in order to account for newobservations.

C.

When the input data can easily fit into memory on a single machine, but we want to calculate confidence intervals for all of the parameters in the model.

D.

When we are required to find the parameters that return the optimal value of the objective function.

 

Answer: AB

 

 

QUESTION 26

Which two techniquesshould you use to avoidoverfittinga classification model to a data set?

 

A.

Include a small number “noise” features that are not through to be correlated with the dependent variable.

B.

Replicate features that are through to be significant predicators of the dependent variable multiple time for each observation.

C.

Separate your input data into a training set that is used for fitting and a test set that is used forevaluating the model’s performance

D.

Include a regularization term in the model’s objective function to control how precisely the model fits the data

E.

Preprocess the data to exclude a typical observation from the model input

 

Answer: AE

 

 

 

QUESTION 27

You have a large file of N records(one per line), andwant to randomlysample 10% them.You have two functions thatareperfect random numbergenerators (through they are a bit slow):

 

Random_uniform ()generates a uniformlydistributed numberin the interval [0, 1]random_permotation(M)generates a random permutationof the number O throughM -1.

 

Below are three different functionsthat implement the sampling.

 

Method A

 

For line in file:

 

If random_uniform () < 0.1;

 

Print line

 

Method B

 

i = 0

 

for line in file:

 

if i % 10 = =0;

 

print line

 

i += 1

 

Method C

 

idxs =random_permotation (N)[: (N/10)]

 

i = 0

 

for line in file:

 

if i inidxs:

 

print line

 

i +=1

 

 

 

 

Which method will have the best runtime performance?

 

A.

Method A

B.

Method B

C.

Method C

 

Answer: A

 

 

QUESTION 28

Refer to the exhibit.

 

clip_image001

 

Which point in the figure is the median?

 

A.

A

B.

B

C.

C

 

Answer: A

 

 

QUESTION 29

 

Which recommender system technique isdomain specific?

 

A.

Content-based collaboration filtering

B.

Item-based collaborative filtering

C.

User-based collaborative filtering

D.

Nae Bayes classifier

 

Answer: C

Reference:http://www.cs.cmu.edu/~srosenth/papers/Rosenthal_RecSys09.pdf

 

 

QUESTION 30

Which bestdescribesthe primaryfunction of Flume?

 

A.

Flume is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with an infrastructure consisting of sources and sinks for importing and evaluating large data sets

B.

Flume acts as a Hadoop filesystem for log files

C.

Flume Imports data from SQL/relational database into your Hadoop cluster

D.

Flume provides a query languages for Hadoop similar to SQL

E.

Flume is a distributed server for collecting and moving large amount of data into HDFS as it’s produced from streaming data flows

 

Answer: D

Free VCE & PDF File for Cloudera DS-200 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …

 >=”cursor: auto; margin: 0cm 0cm 0pt; line-height: normal; text-autospace: ; mso-layout-grid-align: none” align=”left”> 

QUESTION 21

Given the following sample of numbers from a distribution:

 

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

 

What are the five numbers that summarizethis distribution(the fivenumbersummary of samplepercentiles)?

 

A.

1, 3, 8, 34, 89

B.

1, 4, 13, 34, 89

C.

1, 1.5, 5, 24.5, 89

D.

1, 2.5, 8, 27.5, 89

 

Answer: A

 

 

QUESTION 22

 

Which two machinelearning algorithmshould you consideras likely to benefitfrom discretizing continuousfeatures?

 

A.

Support vector machine

B.

Nae Bayes

C.

Decision trees

D.

Logistic regression

E.

Singular value decomposition

 

Answer: AB

Reference:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656082/

 

 

QUESTION 23

Given the following sample of numbers from a distribution:

 

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

 

How do high-level languageslike Apache Hive and Apache Pigefficiently calculateapproximatelypercentiles for a distribution?

 

A.

They sort all of the input samples and the lookup the samples for each percentile

B.

They maintain index of input data as it is loaded into HDFS and load them into memory

C.

They use pivots to assign each observations to the reducer that calculate each percentile

D.

They assign sample observations to buckets and then aggregate the buckets to compute the approximations

 

Answer: C

 

 

QUESTION 24

You are about tosamplea 100-dimensinalunit-cube. To adequatelysample any single givendimension, youneed onlycapture 10 points. Howmany pointsdo you need to orderto sample the complete100-dimensionalunitcube adequately?

 

 

 

 

 

A.

10010

B.

1010

C.

Log2(100)

D.

100

E.

1000

F.

1010

 

Answer: E

 

 

QUESTION 25

Under what two conditions doesstochasticgradientdescentoutperform2nd-order optimizationtechniques such asiterativelyreweightedleast squares?

 

A.

When the volume of input data is so large and diverse that a 2nd-order optimization technique can be fit to a sample of the data

B.

When the model’s estimates must be updated in real-time in order to account for newobservations.

C.

When the input data can easily fit into memory on a single machine, but we want to calculate confidence intervals for all of the parameters in the model.

D.

When we are required to find the parameters that return the optimal value of the objective function.

 

Answer: AB

 

 

QUESTION 26

Which two techniquesshould you use to avoidoverfittinga classification model to a data set?

 

A.

Include a small number “noise” features that are not through to be correlated with the dependent variable.

B.

Replicate features that are through to be significant predicators of the dependent variable multiple time for each observation.

C.

Separate your input data into a training set that is used for fitting and a test set that is used forevaluating the model’s performance

D.

Include a regularization term in the model’s objective function to control how precisely the model fits the data

E.

Preprocess the data to exclude a typical observation from the model input

 

Answer: AE

 

 

 

QUESTION 27

You have a large file of N records(one per line), andwant to randomlysample 10% them.You have two functions thatareperfect random numbergenerators (through they are a bit slow):

 

Random_uniform ()generates a uniformlydistributed numberin the interval [0, 1]random_permotation(M)generates a random permutationof the number O throughM -1.

 

Below are three different functionsthat implement the sampling.

 

Method A

 

For line in file:

 

If random_uniform () < 0.1;

 

Print line

 

Method B

 

i = 0

 

for line in file:

 

if i % 10 = =0;

 

print line

 

i += 1

 

Method C

 

idxs =random_permotation (N)[: (N/10)]

 

i = 0

 

for line in file:

 

if i inidxs:

 

print line

 

i +=1

 

 

 

 

Which method will have the best runtime performance?

 

A.

Method A

B.

Method B

C.

Method C

 

Answer: A

 

 

QUESTION 28

Refer to the exhibit.

 

clip_image001

 

Which point in the figure is the median?

 

A.

A

B.

B

C.

C

 

Answer: A

 

 

QUESTION 29

 

Which recommender system technique isdomain specific?

 

A.

Content-based collaboration filtering

B.

Item-based collaborative filtering

C.

User-based collaborative filtering

D.

Nae Bayes classifier

 

Answer: C

Reference:http://www.cs.cmu.edu/~srosenth/papers/Rosenthal_RecSys09.pdf

 

 

QUESTION 30

Which bestdescribesthe primaryfunction of Flume?

 

A.

Flume is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with an infrastructure consisting of sources and sinks for importing and evaluating large data sets

B.

Flume acts as a Hadoop filesystem for log files

C.

Flume Imports data from SQL/relational database into your Hadoop cluster

D.

Flume provides a query languages for Hadoop similar to SQL

E.

Flume is a distributed server for collecting and moving large amount of data into HDFS as it’s produced from streaming data flows

 

Answer: D

Free VCE & PDF File for Cloudera DS-200 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …