21⟩ List down some of the real time data-warehousing tools?
ETL:
Informatica,
Abinitio,
Datastage etc..,
OLAP:
Business objects
Cognos,
Micro stragetgy,
Hyperion etc..,
DW:
Oracle,
DB2,
Terradata,
Sybase,
Greenplum etc..,
“General Datawarehousing guideline for job interview preparation. Explore list of General Datawarehousing frequently asked questions(FAQs) asked in number of General Datawarehousing interviews. Post your comments as your suggestions, questions and answers on any General Datawarehousing Interview Question or answer. Ask General Datawarehousing Question, your question will be answered by our fellow friends.”
ETL:
Informatica,
Abinitio,
Datastage etc..,
OLAP:
Business objects
Cognos,
Micro stragetgy,
Hyperion etc..,
DW:
Oracle,
DB2,
Terradata,
Sybase,
Greenplum etc..,
The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.
Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in the next section).
Change fact table and DW population routines.
A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.
Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment.Usually a star schema consists of one or more dimension tables around a fact table which looks like a star,so that it got its name.
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
scd3: by adding new columns to target table we maintain historical information and current information
There are a number of data modeling tools
Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
The Primary Key columns of the Tables(Entities) go to the Dimension Tables as Foreign Keys.
The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.
Transaction logs write sequentially and don't need to be read at all. The ideal is to have each on RAID 1/0 because it has much better write performance than RAID 5.
RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking.
RAID 5 is best for data generally because of cost and the fact it provides great read capability.
Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions
here D1,D2 are the Conformed Dimensions one dimension can share with more fact tables through primary key and foreign key relationship.
Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.
Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area.
Granularity
The first step in designing a fact table is to
determine the granularity of the fact table. By
granularity, we mean the lowest level of information
that will be stored in the fact table. This
constitutes two steps:
Determine which dimensions will be included.
Determine where along the hierarchy of each dimension
the information will be kept.
The determining factors usually goes back to the
requirements
Time dimension are used to represent the datas or measures over a certain period of time.The server time dimension is the most widley used one by which we can represent the datas in hierachal manner such as quarter->year->months->week wise representations.
Non-Additive: Non-additive facts are facts that cannot
be summed up for any of the dimensions present in the
fact table.
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined
Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.
Facts are organized in a table is called Fact table.
A Fact is a numeric values or a Business measure.
Every numeric is not a fact. a numeric which occupied a key performance indicator is called Facts
A Fact table contains a Facts at lower granularity level
Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated....This makes it much easier and more efficient to run queries over data that originally came from different sources.
Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.
A Data Warehousing is defined in 2 ways by 2 authors named "Ralph Kimball" and "W.H.Inman"
According to Ralph Kimball, . A D.W.H is a relational database which is specially design for business analysis but not for running the business.
. An enterprise D.W.H is design to make decision process. Hence it is called Decision Support System.
. A Data Warehouse is design to only read operations required for business analysis but not for transactional process. Hence it is called Read Only Database.
According to W.H.Inman, A Data Warehouse is a,
1) Time variant Database
2) Non-Volatile Database
3) Integrated Database
4) Subject oriented Database
and a Data Warehouse is a historical database
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.
Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.
Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:
it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables.
it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.
These tools are used for Data/dimension modeling
1. Oracle Designer
2. ERWin (Entity Relationship for windows)
3. Informatica (Cubes/Dimensions)
4. Embarcadero
5. Power Designer Sybase