Warehouse DataStage

  Home  Data Warehouse  Warehouse DataStage


“Data Warehouse DataStage Frequently Asked Questions in various Data Warehouse DataStage Interviews asked by the interviewer. So learn Data Warehouse DataStage with the help of this Data Warehouse DataStage Interview questions and answers guide and feel free to comment as your suggestions, questions and answers on any Data Warehouse DataStage Interview Question or answer by the comment feature available on the page.”



37 Warehouse DataStage Questions And Answers

21⟩ Explain Containers Usage and Types?

Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers.

a) Local Container: Job Specific

b) Shared Container: Used in any job within a project. ?

There are two types of shared container:?

1.Server shared container. Used in server jobs (can also be used in parallel jobs).?

2.Parallel shared container. Used in parallel jobs. You can also include server shared containers in parallel jobs as a way of incorporating server job functionality into a parallel stage (for example, you could use one to make a server plug-in stage available to a parallel job).regardsjagan

 148 views

22⟩ Explain What is the flow of loading data into fact & dimensional tables?

Here is the sequence of loading a datawarehouse.

1. The source data is first loading into the staging area, where data cleansing takes place.

2. The data from staging area is then loaded into dimensions/lookups.

3.Finally the Fact tables are loaded from the corresponding source tables from the staging area.

 135 views

30⟩ How to eliminate duplicate rows?

The Duplicates can be eliminated by loading thecorresponding data in the Hash file. Specify the columns on which u want to eliminate as the keys of hash.

you can delete duplicate records from source itself using datastage taking option as userdefined query, instead of taking table read option. and u can use remove duplicate stage in datastage. and using of hashfile as source also based on the hash key.

 143 views

31⟩ What is a project? Specify its various components?

You always enter DataStage through a DataStage project. When you start a DataStage client you are prompted to connect to a project. Each project contains:

DataStage jobs.

Built-in components. These are predefined components used in a job.

User-defined components. These are customized components created using the DataStage Manager or DataStage Designer

You always enter DataStage through a DataStage project. When you start a DataStage client you are prompted to connect to a project. Each project contains:

 145 views

34⟩ Explain What is job control?how it is developed?explain with steps?

Controlling Datstage jobs through some other Datastage jobs. Ex: Consider two Jobs XXX and YYY. The Job YYY can be executed from Job XXX by using Datastage macros in Routines.

To Execute one job from other job, following steps needs to be followed in Routines.

1. Attach job using DSAttachjob function.

2. Run the other job using DSRunjob function

3. Stop the job using DSStopJob function

 139 views

35⟩ Explain How to handle the rejected rows in datastage?

We can handle rejected rows in two ways with help of Constraints in a Tansformer.1) By Putting on the Rejected cell where we will be writing our constarints in the properties of the Transformer2)Use REJECTED in the expression editor of the ConstraintCreate a hash file as a temporory storage for rejected rows. Create a link and use it as one of the output of the transformer. Apply either ofthe two stpes above said on that Link. All the rows which are rejected by all the constraints will go to the Hash File.

 178 views

36⟩ Explain What does a Config File in parallel extender consist of?

Config file consists of the following.

a) Number of Processes or Nodes.

b) Actual Disk Storage Location.

The APT_Configuration file is having the information of resource disk,node pool,and scratch information,

node information in the since it contains the how many nodes we given to run the jobs, because based on the nodes only data stage will create processors at back end while running the jobs,resource disk means this is the place where exactly jobs will be loading,scratch information will be useful whenever we using the lookups in the jobs

 139 views

37⟩ Explain How to implement type2 slowly changing dimenstion in datastage?

Slow changing dimension is a common problem in Dataware housing. For example: There exists a customer called lisa in a company ABC and she lives in New York. Later she she moved to Florida. The company must modify her address now. In general 3 ways to solve this problem

Type 1: The new record replaces the original record, no trace of the old record at all, Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two different people. Type 3: The original record is modified to reflect the changes.

In Type1 the new one will over write the existing one that means no history is maintained, History of the person where she stayed last is lost, simple to use.

In Type2 New record is added, therefore both the original and the new record Will be present, the new record will get its own primary key, Advantage of using this type2 is, Historical information is maintained But size of the dimension table grows, storage and performance can become a concern.

Type2 should only be used if it is necessary for the data warehouse to track the historical changes.

In Type3 there will be 2 columns one to indicate the original value and the other to indicate the current value. example a new column will be added which shows the original address as New york and the current address as Florida. Helps in keeping some part of the history and table size is not increased. But one problem is when the customer moves from Florida to Texas the new york information is lost. so Type 3 should only be used if the changes will only occur for a finite number of time.

 130 views