Latest Certified Success Dumps Download

CISCO, MICROSOFT, COMPTIA, HP, IBM, ORACLE, VMWARE
CCB-400 Examination questions (September)

Achieve New Updated (September) Cloudera CCB-400 Examination Questions 21-30

September 24, 2015

Ensurepass

 

QUESTION 21

You have images stored in HBase, which you need to retrieve from within your application. In which format will your data be returned from an HBase scan?

 

 

 

 

 

A.

Uninterpreted array of bytes

B.

Java string literal

C.

Hexadecimal

D.

Blob datatype

 

Answer: A

Explanation: HBase supports a “bytes-in/bytes-out” interface via Put and Result, so anything that can be converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.

 

Reference: The Apache HBaseReference Guide,Supported Datatypes

 

 

QUESTION 22

You have tin linage table live in production. The table users <timestamp> as the rowkey. You want to change the existing rowkeys to <userid><timestamp>. Which of the following should you do?

 

A.

Modify the client application to write to both the old table and a new table while migrating the old data separately

B.

Use the ALTER table command to modify the rowkeys

C.

Use the ASSIGN command to modify the rowkeys

D.

Add a new column to store the userid

 

Answer: A

Explanation: Rowkeys cannot be changed. The only way they can be “changed” in a table is if the row is deleted and then re-inserted. This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you’ve inserted a lot of data).

 

Reference:Rowkey Design

 

 

QUESTION 23

 

Your client application if; writing data to a Region. By default, where is the data saved first?

 

A.

StoreFile

B.

WAL

C.

MemStore

D.

Local disk on the RegionServer

 

Answer: C

Explanation: HBase data updates are stored in a place in memory called memstore for fast write. In the event of a region server failure, the contents of the memstore are lost because they have not been saved to disk yet.

 

Reference:HBase data updates are stored in a place in memory called memstore for fast write. In the event of a region server failure, the contents of the memstore are lost because they have not been saved to disk yet.

 

http://www.cloudera.com/blog/2012/07/hbase-log-splitting/(Log splitting, first paragraph)

 

 

QUESTION 24

Your client application calls the following method for all puts to the single table notifications:

 

‘put.setWriteToWAL, (false);

 

One region, region1, for the notifications table is assigned to RegionServer rs1. Which of the following statements describes the result of RegionServer rs1 crashing?

 

A.

All data in the notifications table is lost

B.

No data is lost

C.

All data for all tables not flushed to disk on RegionServer rs1 is lost

D.

Data for your client application in the MemStores for region1 is lost

 

Answer: D

Explanation: What role does ‘setWriteToWAL(false)’ play?

 

HBase uses a write ahead log, if you don’t write to it you will lose all the data that’s only in the memstores when a region server fails.

 

 

 

 

 

This setting is useful for importing a lot of data.

 

 

QUESTION 25

You have an “Employees” table in HBase. The Row Keys are the employees’ IDs. You would like to retrieve all employees who have an employee ID between ‘user_100’ and ‘user_110’. The shell command you would use to complete this is:

 

A.

scan ‘Employees’, {STARTROW => ‘user_100’, STOPROW => ‘user_111’}

B.

get ‘Employees’, {STARTROW => ‘user_100’, STOPROW => ‘user_110’}

C.

scan ‘Employees’, {STARTROW => ‘user_100’, SLIMIT => 10}

D.

scan ‘Employees’, {STARTROW => ‘user_100’, STOPROW => ‘user_110’}

 

Answer: D

Explanation: public Scan(byte[] startRow,

byte[] stopRow)

Create a Scan operation for the range of rows specified.

Parameters:

startRow – row to start scanner at or after (inclusive) stopRow – row to stop scanner before (exclusive)

 

Reference:o rg.apache.hadoop.hbase.client, Class Scan

 

 

QUESTION 26

You have a key-value pair size of l00 bytes. You increase your HFile block size from its default 64k. What results from this change?

 

A.

scan throughput increases and random-access latency decreases

B.

scan throughput decreases and random-access latency increases

C.

scan throughput decreases and random-access latency decreases

D.

scan throughput increases and random-access latency increases

 

Answer: A

 

 

Explanation: Larger block size is preferred if files are primarily for sequential access. Smaller blocks are good for random access, but require more memory to hold the block index, and may be slower to create

 

Reference: Could I improve HBase performance by reducing the hdfs block size?

 

 

QUESTION 27

The cells in a given row have versions that range from 1000 to 2000. You execute a delete specifying the value 3000 for the version. What is the outcome?

 

A.

The delete fails with an error.

B.

Only cells equal to the Specified version are deleted.

C.

The entire row is deleted.

D.

Nothing in the row is deleted.

 

Answer: C

Explanation: When performing a delete operation in HBase, there are two ways to specify the versions to be deleted

 

Delete all versions older than a certain timestamp

 

Delete the version at a specific timestamp

 

A delete can apply to a complete row, a complete column family, or to just one column. It is only in the last case that you can delete explicit versions. For the deletion of a row or all the columns within a family, it always works by deleting all cells older than a certain version.

 

Deletes work by creating tombstone markers. For example, let’s suppose we want to delete a row. For this you can specify a version, or else by default the currentTimeMillis is used. What this means is “delete all cells where the version is less than or equal to this version”. HBase never modifies data in place, so for example a delete will not immediately delete (or mark as deleted) the entries in the storage file that correspond to the delete condition. Rather, a so-called tombstone is written, which will mask the deleted values[17]. If the version you specified when deleting a row is larger than the version of any value in the row, then you can consider the complete row to be deleted.

 

 

 

 

Reference: Apache HBase, Delete

 

http://archive.cloudera.com/cdh4/cdh/4/hbase/book.html#delete(scroll below and see 5.8.1.5. Delete topic, read the last paragraph)

 

 

QUESTION 28

Your client application needs to scan s region for the row key value 104.

 

Given a store that contains the following list of Row Key values:

 

100, 101, 102, 103, 104, 105, 106, 107

 

A bloom filter would return which of the following?

 

A.

Confirmation that 104 may be contained in the set

B.

Confirmation that 104 is contained in the set

C.

The hash of column family

D.

The file offset of the value 104

 

Answer: B

Explanation:Note:

* When a HFile is opened, typically when a region is deployed to a RegionServer, the bloom filter is loaded into memory and used to determine if a given key is in that store file.

* Get/Scan(Row) currently does a parallel N-way get of that Row from all StoreFiles in a Region. This means that you are doing N read requests from disk. BloomFilters provide a lightweight in-memory structure to reduce those N disk reads to only the files likely to contain that Row (N-B).

* Keep in mind that HBase only has a block index per file, which is rather course grained and tells the reader that a key may be in the file because it falls into a start and end key range in the block index. But if the key is actually present can only be determined by loading that block and scanning it. This also places a burden on the block cache and you may create a lot of unnecessary churn that the bloom filters would help avoid.

 

 

QUESTION 29

 

You have one primary HMaster and one standby. Your primary HMaster Falls fails and your client application needs to make a metadata change. Which of the following is the effect on your client application?

 

A.

The client will query ZooKeeper to find the location of the new HMaster and complete the metadata change.

B.

The client will make the metadata change regardless of the slate of the HMaster.

C.

The new HMaster will notify the client and complete the metadata change.

D.

The client application will fail with a runtime error.

 

Answer: A

Explanation: the HBase master publishes its location to clients via Zookeeper. This is done to support multimaster operation (failover). So if the HBase master self-discovers its location as a localhost address, then it will publish that. Region servers or clients which go to Zookeeper for the

master location will get back an address in that case only useful if they happen to be co- located with the master.

 

Note:

* HMaster is the implementation of the Master Server. The Master server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes.

 

 

QUESTION 30

You have a table with 5 TB of data, 10 RegionServers, and a region size of 256MB. You want to continue with puts to widely disbursed row ids in your table. Which of the following will improve write performance?

 

A.

Increase your buffer cache in the RegionServers

B.

Increase the number of RegionServers to 15

C.

Decrease your number of RegionServers to 5

D.

Decrease your region size to 128MB

 

Answer: C

Explanation:

 

 

 

 

Note:

*Region Size

Determining the “right” region size can be tricky, and there are a few factors to consider:

 

HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine your data will be concentrated on just a few machines – nearly the entire cluster will be idle. This really cant be stressed enough, since a common problem is loading 200M

B data into HBase then wondering why your awesome 10 node cluster isn’t doing anything.

 

On the other hand, high region count has been known to make things slow. This is getting better with each release of HBase, but it is probably better to have 700 regions than 3000 for the same amount of data.

 

There is not much memory footprint difference between 1 region and 10 in terms of indexes, etc, held by the RegionServer.

Free VCE & PDF File for Cloudera CCB-400 Real Exam

Instant Access to Free VCE Files: CompTIA | VMware | SAP …
Instant Access to Free PDF Files: CompTIA | VMware | SAP …