Apache Hive Interview Questions Part – 3
What Is The Difference Between Like And Rlike Operators In Hive?
The LIKE operator behaves the same way as the regular SQL operators used in select queries.
Example − street_name like ‘%Chi’
But the RLIKE operator uses more advance regular expressions which are available in java
Example −
street_name RLIKE ‘.*(Chi|Oho).*’ which will select any word which has either chi or oho in it.
Is It Possible To Create Cartesian Join Between 2 Tables, Using Hive?
No. As this kind of Join can not be implemented in mapreduce
As Part Of Optimizing The Queries In Hive, What Should Be The Order Of Table Size In A Join Query?
In a join query the smallest table to be taken in the first position and largest table should be taken in the last position.
What Is The Usefulness Of The Distributed By Clause In Hive?
It controls how the map output is reduced among the reducers. It is useful in case of streaming data.
How Will You Convert The String ’51.2’ To A Float Value In The Price Column?
Select cast(price as FLOAT)
What Will Be The Result When You Do Cast(‘abc’ As Int)?
Hive will return NULL
Can The Name Of A View Be Same As The Name Of A Hive Table?
No. The name of a view must be unique when compared to all other tables and views present in the same database.
Can We Load Data Into A View?
No. A view can not be the target of a INSERT or LOAD statement.
What Types Of Costs Are Associated In Creating Index On Hive Tables?
Indexes occupies space and there is a processing cost in arranging the values of the column on which index is cerated.
Give The Command To See The Indexes On A Table?
SHOW INDEX ON table_name
This will list all the indexes created on any of the columns in the table table_name.
What Is Bucketing ?
The values in a column are hashed into a number of buckets which is defined by user. It is a way to avoid too many partitions or nested partitions while ensuring optimizes query output.
What Does /*streamtable(table_name)*/ Do?
It is query hint to stream a table into memory before running the query. It is a query optimization Technique.
Can A Partition Be Archived? What Are The Advantages And Disadvantages?
Yes. A partition can be archived. Advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient query and does not offer any space savings.
What Is A Generic Udf In Hive?
It is a UDF which is created using a java program to server some specific need not covered under the existing functions in Hive. It can detect the type of input argument programmatically and provide appropriate response.
The Following Statement Failed To Execute. What Can Be The Cause? Load Data Local Inpath ‘${env:home}/country/state/’ Overwrite Into Table Address;
The local inpath should contain a file and not a directory. The $env:HOME is a valid variable available in the hive environment.
How Do You Specify The Table Creator Name When Creating A Table In Hive?
The TBLPROPERTIES clause is used to add the creator name while creating a table.
The TBLPROPERTIES is added like: TBLPROPERTIES(‘creator’= ‘Joan’)