21⟩ What is Full load & Incremental or Refresh load?
Full Load: completely erasing the contents of one or more tables and reloading with fresh data.
Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.
“Oracle Extract transform load (ETL) Interview Questions and Answers will guide us that Extract, transform, and load (ETL) is a process in database usage and especially in data warehousing that involves Extracting data from outside sources, Transforming it to fit operational needs (which can include quality levels), Loading it into the end target, So learn Oracle ETL with the help of this Extract transform load (ETL) Oracle Interview Questions with Answers guide”
Full Load: completely erasing the contents of one or more tables and reloading with fresh data.
Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.
When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current mth) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes.
ETL Tool:
It is used to Extract(E) data from multiple source systems(like RDBMS,Flat files,Mainframes,SAP,XML etc) transform(T) them based on Business requirements and Load(L) in target locations.(like tables,files etc).
Need of ETL Tool:
An ETL tool is typically required when data scattered accross different systems.(like RDBMS,Flat files,Mainframes,SAP,XML etc).
Yes, you can use advanced external transformation. for more detail you can refer the manual of informatica transformation guide in that advance external transformation. You can use c++ language on unix and c++, vb vc++ on windows server.
Yes,we can override a native sql query in source qualifier and lookup transformation.
In lookup transformation we can find "Sql override" in lookup properties.by using this option we can do this.
Parameter file defines the value for parameter and variable used in a workflow, worklet or session.
Yes. we can use mapping variable in Informatica.
The Informatica server saves the value of mapping variable to the repository at the end of session run and uses that value next time we run the session.
A mapping represents dataflow from sources to targets.
A mapplet creates or configures a set of transformations.
A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks.
A worklet is an object that represents a set of tasks.
A session is a set of instructions that describe how and when to move data from sources to targets.
Informatica Metadata is data about data which stores in Informatica repositories.
Specify the Full path of the Shell script the "Post session properties
of session/workflow".
yes, you can use heterogenous source and target in single mapping. But to join data from heterogenous source you have to use joiner transformation.
1. Connected lookup
2. Unconnected lookup
Connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values.it does not contain retun port.
Unconnected lookup can return only one column. it containn return port.
Active transformation can change the number of rows that pass through it. (decrease or increase rows)
Passive transformation can not change the number of rows that pass through it.
1. PowerMart Designer
2. Server
3. Server Manager
4. Repository
5. Repository Manager
ETL - The process of extracting data from multiple sources.(ex. flat files,XML, COBOL, SAP etc) is more simpler with the help of tools.
Manual - Loading the data other than flat files and oracle table need more effort.
ETL - High and clear visibilty of logic.
Manual - complex and not so user friendly visibilty of logic.
ETL - Contains Meta data and changes can be done easily.
Manual - No Meta data concept and changes needs more effort.
ETL- Error hadling,log summary and load progess makes life easier for developer and maintainer.
Manual - need maximum effort from maintainance point of view.
ETL - Can handle Historic data very well.
Manual - as data grows the processing time degrads.
These are some differences b/w manual and ETL developement.
The ANALYZE statement allows you to validate and compute statistics for an index, table, or cluster. These statistics are used by the cost-based optimizer when it calculates the most efficient plan for retrieval. In addition to its role in statement optimization, ANALYZE also helps in validating object structures and in managing space in your system. You can choose the following operations: COMPUTER, ESTIMATE, and DELETE. Early version of Oracle7 produced unpredicatable results when the ESTIMATE operation was used. It is best to compute
your statistics.
EX:
select OWNER,
sum(decode(nvl(NUM_ROWS,9999), 9999,0,1)) analyzed,
sum(decode(nvl(NUM_ROWS,9999), 9999,1,0)) not_analyzed,
count(TABLE_NAME) total
from dba_tables
where OWNER not in ('SYS', 'SYSTEM')
group by OWNER
If you use PowerCenter, you can increase the number of partitions in a pipeline to improve session performance. Increasing the number of partitions allows the Informatica Server to create multiple connections to sources and process partitions of source data concurrently.
When you create a session, the Workflow Manager validates each pipeline in the mapping for partitioning. You can specify multiple partitions in a pipeline if the Informatica Server can maintain data consistency when it processes the partitioned data.
When you configure the partitioning information for a pipeline, you must specify a partition type at each partition point in the pipeline.
The partition type determines how the Informatica Server redistributes data across partition points.
The Workflow Manager allows you to specify the following partition types:
Round-robin partitioning. The Informatica Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
For more information, see Round-Robin Partitioning.
Hash partitioning. The Informatica Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the Informatica Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the Informatica Server processes groups of rows
with the same partition key in the same partition. For more
information, see Hash Partitioning.
Key range partitioning. You specify one or more ports to form a compound partition key. The Informatica Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. For more information, see Key Range Partitioning.
Pass-through partitioning. The Informatica Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.
Snapshots are read-only copies of a master table located on a remote node which is periodically refreshed to reflect changes made to the master table. Snapshots are mirror or replicas of tables.
Views are built using the columns from one or more tables. The Single Table View can be updated but the view with multi table cannot be updated.
A View can be updated/deleted/inserted if it has only one base table if the view is based on columns from one or more tables then insert, update and delete is not possible.
Materialized view
A pre-computed table comprising aggregated or joined data from fact and possibly dimension tables. Also known as a summary or aggregate table.