101⟩ What is Execution Plan?
The combination of the steps the optimizer chooses to execute a statement is called an execution plan.
“Data Warehousing Interview Questions and Answers will guide now that Data warehouse is a repository of an organizations electronically stored data. Data warehouses are especially designed to facilitate reporting and analysis about the data of any organization. So learn Data Warehousing concepts by Data Warehousing Interview Questions and Answers and get preparation of Data Warehousing Jobs Interview.”
The combination of the steps the optimizer chooses to execute a statement is called an execution plan.
Rule-based and Cost-based
OLAP stands for online analytical processing. In this, we have access to live data. This process contains historical information to analyze. Data needs to be integrated. We can create reports that are multi-dimensional, supported by time-based analysis and ideal for applications with unpredictable, ad hoc query requirements.OLTP stands for online transaction processing. OLTP databases are fully normalized and are designed for consistently store operational data, one transaction at a time. It performs day-to -day operations and not support historical data.
Data modeling is the process of designing a data base model. In this data model data will be stored in two types of table fact table and dimension table
Fact table contains the transaction data and dimension table contains the master data. Data mining is process of finding the hidden trends is called the data mining.
Aggregate tables are pre-computed totals in the form of hierarchical multidimensional structure., whereas materialized view ,is an database object which caches the query result in a concrete table and updates it from the original database table from time to time .Aggregate tables are used to speed up the query computing whereas materialized view speed up the data retrieval .
Dimension table has got all the detail in formations of their respective table ,for egg, customer dimension table will contain all the related info about customers whereas fact table contains the main data, which contains the surrogate keys of every dimension; along with other measures.
Data profiling is a way to find out what is the profile of the information contained in the source. E.g. In a table a column may be defined as alphanumeric. However, majority of the data may be numeric. Profiling tools will provide the statistical information about how many records have pure no. populated as against no. of records with alphanumeric data.Before data migration exercise, these tools provide vital clues about whether the exercise is going to be a success or a failure. This can help is changing the target schema or applying cleanse at the source level so that most of the records can get in the destination database.In DW these tools are used at the design stage for the same purpose. Some tool vendors who sell this as a product call this as data discovery phase.
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at operational databases directly using whatever tools are enabled to the "data access network"
Method 1 is system develop lifecycle create by Arthur Anderson a while back.
Mapping Parameter defines the constant value and it cannot change the value throughout the session.Mapping Variables defines the value and it can be change throughout the session
Dependent departments are those, which depend on a data ware to for their data.Independent department are those, which get their data directly from the operational data sources in the organization.
Meta data is nothing but information about data. It contains the information (i.e. data) about the graphs, its related files, abinitio commands, server information etc i.e. all kinds of information about project related information etc.
If u have one to may relation ship in the data then only we choose snowflake schema, as per the performance-wise every-one go for the Star schema. Moreover, if the ETL is concerned with reporting means choose for snowflake because this schema provides more browsing capability than the former schema.
Many-many relations can be implemented by using snowflake schema .With a max of n dimensions.
A multi-dimensional structure called the data cube. A data abstraction allows one to view aggregated data from a number of perspectives. Conceptually, the cube consists of a core or base cuboids, surrounded by a collection of sub-cubes/cuboids that represent the aggregation of the base cuboids along one or more dimensions. We refer to the dimension to be aggregated as the measure attribute, while the remaining dimensions are known as the feature attributes.
We cannot use sql queries in filter transformation. It will not allow you to override default sql query like other transformations (Source Qualifier, lookup)
The generated report will be sent to the concerned business users through web or LAN.
Let us take one ex: Suppose 'XYZ' is customer in Bangalore, he was residing in the city from the last 5 years, in the period of 5 years he has made purchases worth of 3 lacs. Now, he moved to 'HYD'. When you update the 'XYZ' city to 'HYD' in your Warehouse, all the purchases by him will show in city 'HYD' only. This makes warehouse inconsistent. Here CITY is the Critical Column. Solution is use Surrogate Key.
Data analysis: consider that you are running a business and u store the data of that; in some form say in register or in a comp and at the year end you want know the profit or loss then it called data analysis .Data analysis use: then u want to know which product was sold the highest and if the business is running in a loss then finding, where we went wrong we do analysis.
The basic advantage of RAID is to speed up the data reading from permanent storage device (hard disk).