Best Apache Hbase Interview Questions Part – 2
What Are The Different Compaction Types In Hbase?
There are two types of compaction. Major and Minor compaction. In minor compaction, the adjacent small HFiles are merged to create a single HFile without removing the deleted HFiles. Files to be merged are chosen randomly.
In Major compaction, all the HFiles of a column are emerged and a single HFiles is created. The delted HFiles are discarded and it is generally triggered manually.
What Is A Cell In Hbase?
A cell in Hbase is the smallest unit of a Hbase table which holds a piece of data in the form of a tuple{row,column,version}
What Is The Scope Of A Rowkey In Hbase?
Rowkeys are scoped to ColumnFamilies. The same rowkey could exist in each ColumnFamily that exists in a table without collision.
What Is The Role Of The Class Hcolumndescriptor In Hbase?
This class is used to store information about a column family such as the number of versions, compression settings, etc. It is used as input when creating a table or adding a column.
What Is A Namespace In Hbase?
A Namespace is a logical grouping of tables . It is similar to a database object in a Relational database system.
What Is The Lower Bound Of Versions In Hbase?
The lower bound of versions indicates the minimum number of versions to be stored in Hbase for a column. For example If the value is set to 3 then three latest version wil be maintained and the older ones will be removed.
What Is Hotspotting In Hbase?
Hotspotting is a situation when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. This traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability.
What Is Ttl (time To Live) In Hbase?
TTL is a data retention technique using which the version of a cell can be preserved till a specific time period.Once that timestamp is reached the specific version will be removed.
Why Do We Pre-create Empty Regions?
Tables in HBase are initially created with one region by default. Then for bulk imports, all clients will write to the same region until it is large enough to split and become distributed across the cluster. So empty regions are created to make this process faster.
Does Hbase Support Table Joins?
Hbase does not support table joins. But using a mapreduce job we can specify join queries to retrieve data from multiple Hbase tables.
Which File In Hbase Is Designed After The Sstable File Of Bigtable?
The HFile in Habse which stores the Actual data(not metadata) is designed after the SSTable file of BigTable.
What Is A Hbase Store?
A Habse Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
What Are The Two Types Of Table Design Approach In Hbase?
They are:
Short and Wide
Tall and Thin
When Do We Do Manual Region Splitting?
The manual region splitting is done we have an unexpected hotspot in your table because of many clients querying the same table.
In Which Scenario Should We Consider Creating A Short And Wide Hbase Table?
The short and wide table design is considered when there is
There is a small number of columns
There is a large number of rows
In Hbase What Is Log Splitting?
When a region is edited, the edits in the WAL file which belong to that region need to be replayed. Therefore, edits in the WAL file must be grouped by region so that particular sets can be replayed to regenerate the data in a particular region. The process of grouping the WAL edits by region is called log splitting.
How Does Hbase Support Bulk Data Loading?
There are two main steps to do a data bulk load in Hbase:
Generate Hbase data file(StoreFile) using a custom mapreduce job) from the data source. The StoreFile is created in Hbase internal format which can be efficiently loaded.
The prepared file is imported using another tool like comletebulkload to import data into a running cluster. Each file gets loaded to one specific region.
Why Multiwal Is Needed?
With a single WAL per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck.
How Does Hbase Provide High Availability?
Hbase uses a feature called region replication. In this feature for each region of a table, there will be multiple replicas that are opened in different RegionServers. The Load Balancer ensures that the region replicas are not co-hosted in the same region servers.
How Does Wal Help When A Regionserver Crashes?
The Write Ahead Log (WAL) records all changes to data in HBase, to file-based storage. if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed.
What Is Hregionserver In Hbase?
HRegionServer is the RegionServer implementation. It is responsible for serving and managing regions. In a distributed cluster, a RegionServer runs on a DataNode.
What Are The Different Block Caches In Hbase?
HBase provides two different BlockCache implementations: the default on-heap LruBlockCache and the BucketCache, which is (usually) off-heap.