Data Warehousing Interview Questions Part – 3
1.What is Data warehousing?
Data Warehouse is a repository of integrated information,available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated….This makes it much easier and more efficient to run queries over data that originally came from different sources.
Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result,data warehouses are designed differently than traditional relational databases.
2.What is ODS?
ODS means Operational Data Store. A collection of operation or bases data that is extracted from operation databases and standardized,cleansed,consolidated,transformed,and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data,or as the store for base data that is summarized for a data warehouse.
The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database,allowing operational systems that are being reengineered to use the ODS as there operation databases.
3.What is a lookup table?
A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target,it just updates the table by allowing only new records or updated records based on the lookup condition.
4.What is Data Mining?
Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information.
5.What is the difference between view and materialized view?
A view takes the output of a query and makes it appear like a virtual table and it can be used in place of tables. A materialized view provides indirect access to table data by storing the results of a query in a separate schema object.
6.What is OLTP?
OLTP is abbreviation of On-Line Transaction Processing. This system is an application that modifies data the instance it receives and has a large number of concurrent users.
7.What is OLAP?
OLAP is abbreviation of Online Analytical Processing. This system is an application that collects,manages,processes and presents multidimensional data for analysis and management purposes.
8.What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.
9.Why should you put your data warehouse on a different system than your OLTP system?
A OLTP system is basically ” data oriented ” (ER model) and not ” Subject oriented “(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system… Moreover if a complex querry is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the daytoday business directly.
10.What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions.Retrieving the required data from the actual table,which have millions of records will take more time and also affects the server performance.To avoid this we can aggregate the table to certain required level and can use it.This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.
11.What is ETL?
ETL is abbreviation of extract,transform,and load. ETL is software that enables businesses to consolidate their disparate data while moving it from place to place,and it doesn’t really matter that that data is in different forms or formats. The data can come from any source.ETL is powerful enough to handle such data disparities.
First,the extract function reads data from a specified source database and extracts a desired subset of data. Next,the transform function works with the acquired data – using rules or lookup tables,or creating combinations with other data – to convert it to the desired state. Finally,the load function is used to write the resulting data to a target database.
12.What is Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse designers to build their data-warehouse. In this design model all the data is stored in two types of tables – Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e,the dimensions on which the facts are calculated.
13.Why is Data Modelling Important?
Data Modelling is probably the most labour intensive and time consuming part of the development process.
14.What is SCD1 ,SCD2 ,SCD3?
SCD Stands for Slowly changing dimensions. SCD1: only maintained updated values. Ex: a customer address modified we update existing record with new address. SCD2: maintaining historical information and current information by using A) Effective Date B) Versions C) Flags or combination of these SCD3: by adding new columns to target table we maintain historical information a nd current information.
15.Why are OLTP database designs not generally a good idea for a Data Warehouse?
Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.
16.What is VLDB?
VLDB is abbreviation of Very Large Database. A one terabyte database would normally be considered to be a VLDB. Typically,these are decision support systems or transaction processing applications serving large numbers of users.
17.If de-normalized is improves data warehouse processes,why fact table is in normal form?
Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table.
18.What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts
19.What is Normalization
Normalization is process for assigning attributes to entities–Reduces data redundancies–Helps eliminate data anomalies–Produces controlled redundancies to link tables . Normalization is the analysis of functional dependency between attributes / data items of userviews.It reduces a complex user view to a set of small and stable subgroups of fields / relations
20.What is First Normal Form?
Repeating groups must be eliminated,Dependencies can be identified,All key attributes defined,No repeating groups in table
21.What is Second Normal Form?
The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a portion of primary key,Still possible to exhibit transitive dependency,Attributes may be functionally dependent on non-key attributes .
22.What is Third Normal Form?
The Table is already in 2NF,Contains no transitive dependencies .
23.What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If your business process is “Sales” ,then a measurement of this business process such as “monthly sales number” is captured in the Fact table. Fact table also contains the foreign keys for the dimension tables
24.What are conformed dimensions?
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc
25.How do you load the time dimension?
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. It is not unusual for 100 years to be represented in a time dimension,with one row per day.
26.Which columns go to the fact table and which columns go the dimension table?
The Primary Key columns of the Tables(Entities) go to the Dimension Tables as Foreign Keys. The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.
27.What is a Star Schema?
Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment.Usually a star schema consists of one or more dimension tables around a fact table which looks like a star,so that it got its name.
28.Differences between star and snowflake schemas?
Star schema – all dimensions will be linked directly with a fat table. Snow schema – dimensions maybe interlinked or may have one-to-many relationship with other tables.
29.What Snow Flake Schema?
Snowflake Schema,each dimension has a primary dimension table,to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.
30.What is real time data-warehousing?
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete,there is data about it.
Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it,the completed activity data flows into the data warehouse and becomes available instantly. In other words,real-time data warehousing is a framework for deriving information from data as the data becomes available.