Warehouse Manager

  Home  Data Warehouse  Warehouse Manager


“Data Warehouse Manager Frequently Asked Questions in various Warehouse Manager job interviews by interviewer. The set of questions are here to ensures that you offer a perfect answer posed to you. So get preparation for your new job interview”



113 Warehouse Manager Questions And Answers

41⟩ Explain me what is the function of ETL when used in data warehousing?

ETL is used for extract, transform, and load and it is used to enable the business models to consolidate their data that is moving from place to place and it allows the saving of the data in different forms and different formats. The data can be used from any source and can be included by any source that is defined. It is a powerful method to handle the data disparities and other problems that are related to the data.

There are few steps that are involved in using the function of ETL and it is as follows:

The extract function reads the data from the source that is given and stores the extracted information in the database of the path that is given.

The transformation of the data occurs that is done through some rules keeping the lookup tables to see the combination of the data that is getting formed and the transformed data are arranged and kept in order of the choice that is made while transforming it.

The load function is used load the data to the database or the user location where it needs to be located and then the resulting data is to be loaded to the target location.

 136 views

42⟩ Tell me what is Fact Table?

Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables.

Example – If the business process is manufacturing of bricks

Average number of bricks produced by one person/machine – measure of the business process

 140 views

43⟩ Tell me what is the function of surrogate key in data warehousing?

Surrogate key is the key that can be substituted with the primary key. It is a unique key that uniquely identifies the each row that is used. This also allows the unique search of the primary key to the table. It is used in the database to uniquely identify an entity in a model or an object that is present in the database. This key is not derived from any of the data entity and not even from the application data. These are represented either in the form of numbers or integers. This kind of keys is generated from the system and it is not at all visible to the outside world. This key can be used as a primary key for a given database and there are few differences between primary key and surrogate key. There remains many to one relationship between primary and surrogate keys.

 129 views

44⟩ Do you know what is OLTP?

OLTP is abbreviated as On-Line Transaction Processing, and it is an application that modifies the data whenever it received and has large number of simultaneous users.

 138 views

46⟩ Explain me what is the use of dimensional modeling in data warehousing?

Dimensional modeling is a set of techniques that is used in designing the overall structure of data warehousing. It doesn't involve relational database but at the logical level uses the physical form of the database. It is used to support user queries and to increase the performance and understanding of a particular database concept. It uses facts and dimensions to support the measures and the context of the database. The facts define the values that can be aggregated and dimensions represent the group of hierarchies and the descriptors that define the facts in return. This type of models is built by business process model and consists inside the process area. This process area consists of the same design and operation details as others.

 121 views

47⟩ Do you know what is the purpose of cluster analysis in Data Warehousing?

Cluster analysis is used to define the object without giving the class label. It analyzes all the data that is present in the data warehouse and compare the cluster with the cluster that is already running. It performs the task of assigning some set of objects into the groups are also known as clusters. It is used to perform the data mining job using the technique like statistical data analysis. It includes all the information and knowledge around many fields like machine learning, pattern recognition, image analysis and bio-informatics. Cluster analysis performs the iterative process of knowledge discovery and includes trials and failures. It is used with the pre-processing and other parameters as a result to achieve the properties that are desired to be used.

 137 views

50⟩ Explain what are additive, semi-additive and non-additive measures?

☛ Non-additive Measures:

Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.

☛ Semi Additive Measures:

Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.

☛ Additive Measures:

Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.

 128 views

52⟩ What is ETL?

ETL is abbreviated as Extract, Transform and Load. ETL is a software which is used to reads the data from the specified data source and extracts a desired subset of data. Next, it transform the data using rules and lookup tables and convert it to a desired state.

 145 views

54⟩ Explain me what are the different types of schemas used in Data warehousing?

Schema defines the type of structure that is used by the database to hold on some data that can be related or can be different.

There are three types of schema that exists in the database as follows:

BUS schema: This is the schema that is composed of the master file that is totally confirmed with the dimension and with standardized definition including the facts.

Star schema: This is the schema that defines the organization type of the tables and it is used to retrieve the result from the database quickly in a controlled environment.

Snow flake schema: This schema is used to show the primary dimension table, that includes one or more dimensions that can be joined. Primary dimension table only allows the joining of the fact table.

 133 views

55⟩ Do you know why is chameleon method used in data warehouse?

Chameleon is a hierarchical clustering algorithm that overcomes the limitations of the existing models and the methods present in the data warehousing. This method operates on the sparse graph having nodes that represent the data items and edges represent the weights of the data items. The representation of it allows large data set to be created and operated on successfully. The method finds the clusters that are used in the data set using the two phase algorithm. The first phase consists of the graph partitioning that allows the clustering of the data items into large number of sub-clusters. Second phases use an agglomerative hierarchical clustering algorithm to search for the clusters that are genuine and can be combined together with the sub-clusters that are produced.

 134 views

56⟩ Explain what is Fact?

A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.

 128 views

58⟩ Explain me what is Dimension Table?

Dimension table is a table which contain attributes of measurements stored in fact tables. This table consists of hierarchies, categories and logic that can be used to traverse in nodes.

 125 views

59⟩ Explain me what are the different types of fact tables used in data warehousing?

There are three types of fact tables that are used and the characterization is given below:

Transactional Table: It is the most basic table of data warehousing. It consists of grain that has the association with the transactional fact table and it is defines to use one row per line in a transactional. It consists of the most detailed data that have many number of dimensions associated with it.

Periodic snapshots: It is used to take the snapshot of the currently working data set. It depends on the period of time when the snapshots are taken. It increases the performance. This table is dependent on the transactional table as it has to be synchronized to keep all the detailed data that is mentioned in transactional table. The snapshot can be looked if any problem take place in the database and it helps in recovering the state that is lost.

Accumulating snapshots: This type of table is used to show the activity of the process that is defined from the beginning till end. For example, order process. The snapshots consist of multiple date columns that represent a timeline of the process. It keeps and entry of the associate date dimension representing the unknown date.

 126 views