Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
DS-200 Examination questions (September)

Achieve New Updated (September) Cloudera DS-200 Examination Questions 1-10

September 24, 2015

Ensurepass

 


Exam A

 

QUESTION 1

What is the result of thefollowing command (thedatabase username is foo and password is bar)?

 

$ sqoop list-tables – -connect jdbc :mysql: / /localhost/databasename – -table – – usernamefoo – -password bar

 

A.

sqoop lists only those tables in the specified MySql database that have not already been imported into FDFS

B.

sqoop returns an error

C.

sqoop lists the available tables from the database

D.

sqoopimports all the tables from SQLHDFS

 

Answer: C

Reference:https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter- 15/getting-sqoop

 

 

QUESTION 2

You are building ak-nearest neighborclassifier (k-NN) on a labeled set of points in ahigh- dimensionalspace.You determine that theclassifier has alargeerroron thetraining data.What is the most likelyproblem?

 

A.

High-dimensional spaces effectively make local neighborhoods global

B.

k-NN compotation does not coverage in high dimensions

C.

k was too small

D.

The VC-dimension of a k-NN classifier is too high

 

Answer: B

 

 

QUESTION 3

Howcan thenaivetfthe naiveBayesclassifierbe advantageous?

 

A.

It does not require you to make strong assumptions about the data because it is a non- parametric

B.

It significantly reduces the size of the parameter space, thus reducing the risk of over fitting

C.

It allows you to reduce bias with no tradeoff in variance

D.

It guarantees convergence of the estimator

 

Answer: A

 

 

QUESTION 4

You have a large file of N records (one per line), and want to randomly sample 10% them. You have two functions that are perfect random number generators (through they are a bit slow):

 

Random_uniform () generates a uniformly distributed number in the interval [0, 1] random_permotation (M) generates a random permutation of the number O through M -1.

 

Below are three different functions that implement the sampling.

 

Method A

 

For line in file:

 

If random_uniform () < 0.1;

 

Print line

 

Method B

 

i = 0

 

for line in file:

 

if i % 10 = = 0;

 

print line

 

i += 1

 

Method C

 

idxs = random_permotation (N) [: (N/10)]

 

i = 0

 

 

 

 

for line in file:

 

if i in idxs:

 

print line

 

i +=1

 

Which method is least likely to give you exactly 10% of your data?

 

A.

Method A

B.

Method B

C.

Method C

 

Answer: B

 

 

QUESTION 5

There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute myeloid leukemia (AML), both variants of a blood cancer.

 

The makeup of the groups as follows:

 

clip_image001

 

Each individual has an expression value for each of 10000 different genes. The expression

 

 

 

 

value for each gene is a continuous value between -1 and 1.

 

With which type of plot can you encodethe most amount of the datavisually?

 

A.

A heat map sorting the individuals by group

B.

A histogram of the expression values

C.

A scatter plot of two largest principal components

 

Answer: C

 

 

QUESTION 6

Function is convex if the linesegment between two points,a and b is greater than equal to the value of the a xb

 

clip_image002

 

Which two functions are convex?

 

A.

X1/2

B.

Ex

C.

2x-1

D.

1-x2

 

Answer: A

 

 

QUESTION 7

Consider the followingsample froma distributionthat containsa continuousX and label Y that iseither A or B:

 

 

 

 

 

clip_image003

 

Which is the best cut point forX if you want todiscretizethese values into twobucketsin a way thatminimizes the sumof chi-squarevalues?

 

A.

X8

B.

X6

C.

X5

D.

X4

E.

X2

 

Answer: D

 

 

QUESTION 8

Why should stop an interactive machinelearningalgorithm assoon as the performanceof the model on a test set stops improving?

 

A.

To avoid the need for cross-validating the model

B.

To prevent overfitting

C.

To increase the VC (VAPNIK-Chervonenkis) dimension for the model

D.

To keep the number of terms in the model as possible

E.

To maintain the highest VC (Vapnik-Chervonenkis) dimension for the model

 

Answer: B

 

 

 

QUESTION 9

You have a large m x n data matrix

M.You decide you want to perform dimension reduction/clustering on your data and have decide to use the singular value decomposition (SVD; also called principal components analysis PCA)

 

Refer to the passageabove.

 

What representsthe SVDof the Matrix standardMgiventhe following information:

 

U is m x munitary

 

Visn x nunitary

 

S is m x ndiagonal

 

Q isn x n invertible

 

D is n x ndiagonal

 

L is m x mlower triangular

 

U is m x m upper triangular

 

A.

M = U S V

B.

M = U P

C.

M = Q D Q-1

D.

M = L U

 

Answer: A

 

 

QUESTION 10

You are building a system to perform outlier detection for a large online retailer. You need to build a system to detect if the total dollar value of sales are outside the norm for each U.S. state, as determined from the physical location of the buyer for each purchase.

 

The retailer’s data sourcesare scatteredacross multiple systems and databases and are unorganized with little coordination or shared data or keys between the various data sources.

 

 

 

 

Below are the sources of data available to you. Determine which three will give you the smallest set of data sources but still allow you to implement the outlier detector by state.

 

A.

Database of employees that Includes only the employee ID, start date, and department

B.

Database of users that contains only their user ID, name, and a list of every Item the user has viewed

C.

Transaction log that contains only basket ID, basket amount, time of sale completion, and a session ID

D.

Database of user sessions that includes only session ID, corresponding user ID, and the corresponding IP address

E.

External database mapping IP addresses to geographic locations

F.

Database of items that includes only the item name, item ID, and warehouse location

G.

Database of shipments that includes only the basket ID, shipment address, shipment date, and shipment method

 

Answer: ADF

Free VCE & PDF File for Cloudera DS-200 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …