Informatica is a Data integration Tool mainly used for ETL purpose.
ETL stands for Extract, Transform and Load.
Extract – Extracting the data from one / many sources.
Source can be Relational (Ex: Oracle, Microsoft SQL server etc) or files (Fixed or delimited flat file, XML file etc).
Transform- Transforming the data according the business needs by cleansing the data.
Ex: Converting the Date Format coming from source to the format as required in target, Name cleansing etc.
Load – Loading the data into the Target (either Table or Flat file).
Commonly used Client tools as follows:
1) R- Repository Manager
2) D- Designer
3) W- Workflow Manager
4) M –Monitor
Repository is the place where the ETL operations take place.
It is used to store the Metadata.
It is mainly used by the Administrator for creating the folders, users, providing the privileges (Access, Authentication) etc.
Steps to add repository:
1) Open the Repository Window.
2) Go to Repository and Click on Add Repository
The below screen will be displayed:
Once the repository and the Username are entered, click OK.
3) The repository gets added to the Repository Manager.
4) Then right click on the newly added repository name in the left panel, the below screen will be displayed:
5) Enter the Password and click on More Button, the following screen will be displayed.
5) Click on Add button to add the Domain information such as Domain Name,Gateway Host and Gateway Port as shown below:
6) Once the domain is added click on “OK” and then on “CONNECT”.
By doing so, you will be connected to the configured repository.
The designer includes:
i) Source Analyzer:
Source Tables or flat files that will be used in the mapping are imported here.
ii) Target Designer:
Target tables or flat files that will be used in the mapping are imported here.
iii) Transformation Developer:
Used to Create the Transformations that can be reused in multiple mappings.
iv) Mapping Designer:
Used to create Transformation that connects Source to the Target.
Transformations in the mapping cannot be reused in other mappings unless it is configured to be reusable.
v) Mapplet Designer:
Used to create and configure the Transformations that can be used in multiple mappings.
3) TRANSFORMATION:
Designer provides the set of transformations that perform specific functions.
The Transformation types can be classified based on the following:
a) Whether number of rows coming from source are affected
b) Whether it is Connected or Unconnected
Based on whether numbers of rows coming from source are affected or not, the transformations are classified as follows:
In an Active Transformation the Total Number of Output is not same as the Total number of Input rows.
If “m” is the number of rows in the source, row count in target can be m+n (Joiner Transformation) or
m-n (Filter Transformation)
For
example, In Filter Transformation, if we need to filter the records
based on certain condition the records in the output will be less than
the input.
In Passive Transformation the Total Number of Input is exactly equal to the Total Number of Output rows.
Example: Expression Transformation
In Expression Transformation we just perform certain calculations that do not change the number of rows.
Based on whether the transformation is Connected or Unconnected, the transformations are classified as follows:
Connected
Transformation is where one of the Input/Output Ports is connected to
any other Transformation or with the Source/Target.
Example: Aggregator Transformation.
Unconnected
Transformation is not connected to any other transformation in the
mapping and is called within another Transformation.
Example: Lookup Transformation
Note: Look up Transformation can be used as connected or Unconnected Transformation.
Commonly Used Transformations:
Source Qualifier
Aggregator
Look Up
Filter
Router
Expression
Joiner
Transformations are created using the Designer Tools:
SOURCE: Columns
which are necessary for us can alone be taken to the Source Qualifier
instead of taking all the columns because this will improve the
performance.
Whenever
a source is dragged into the Mapping Designer, the Source Qualifier
comes along with the source. We can either use or delete this
transformation.
SOURCE QUALIFIER TRANSFORMATION (SQ):
1) Filter conditions can be given if we need to filter only particular records required from the source.
2)
Same source qualifier can be used in case if we need to join 2 or more
source tables which can be from other schema also but homogenous in
nature.
Note:
If we need to join tables from different databases or files systems or
basically heterogeneous sources, Source qualifier transformation cannot
be used, instead Joiner or Lookup Transformation can be used.
Important Points to be noted while using SQ:
i)
The number of columns that are in the select statement should match
with the number of ports which are projected from the SQ (The order
should also be the same). In case if the number of rows / order does not
match the following error is displayed:
“Query should exactly match “n” columns projected from the source”
ii) To check whether the query which is given in the SQ is valid or not,
We should first establish ODBC Connectivity
1) Right click the SQ, go to Properties.
2) At
the bottom of the window, “connect to database” option will be there,
if there is some prior connection then, we just need to select the ODBC
from the drop-down and give Username and Password.
Username: schema name used to connect to Database.
Password: Password used to connect to Database
Else the following steps are to be followed:
3) Against ODBC data source, click on the LOV box
The following screen will be displayed:
4) Click on Add button. Then the following screen will be displayed:
5) Click on ODBC for Oracle if oracle is used as the database.
Then the below screen will be displayed:
Data Source Name: meaningful name
Description: Any meaning Description
Username: User name with which we connect to Informatica (Ex: Trainee1 from the above example)
Server: Database Node or Connection string (For example Training is the server from the below example)
After all the above steps click “Validate”, then we will get the message “No Errors Detected” in case if there are no errors.
The following are the sample Queries that can be done using SQ:
Example 1: There are 2 tables namely CUSTOMER, PRODUCT
SELECT
CUSTOMER.CUSTOMER_ID,
CUSTOMER.CUSTOMER_NAME,
CUSTOMER.CUSTOMER_ADDRESS,
CUSTOMER.CUSTOMER_CODE,
PRODUCT.PRODUCT_ID,
PRODUCT.PRODUCT_NAME,
PRODUCT.PRODUCT_CODE
FROM
CUSTOMER, PRODUCT
WHERE CUSTOMER.CUSTOMER_ID = '100'
AND PRODUCT.PRODUCT_ID between 1 and 500
AND CUSTOMER.CUSTOMER_CODE= PRODUCT.PRODUCT_CODE
Example 2:
There
is a table LOCATION contains the column START _DATE if we need the
start date and also the minimum of the START_DATE it can be done in SQ:
From the below query we can get the minimum of START_DATE and the START_DATE (all dates) can be taken directly from the table.
SELECT B.MIN_START_DATE
FROM LOCATION,
SELECT MIN (START_DATE) AS MIN_START_DATE FROM LOCATION) B
Note:
If we need to select the minimum date column from the same table the
query in Example 2 can be used also MIN_START_DATE column should be
added to PORTS in SQ containing the same data type as START_DATE.
The aggregator is to perform GROUP BY operations on a particular column.
For
example out of 5 columns in the select clause we need distinct values
for only 2 columns then aggregator can be placed. It is also used for
getting SUM (), AVG (), MIN (), MAX () of the columns grouped on
something (like AVG (Salary) in Departments).
Important Points:
i) In Aggregator Number of Input and Output ports should be equal else the mapping will be invalid.
ii)
The order of the columns that are coming out of Aggregator should be
the same to the next transformation. Else the following error will be
displayed:
“Column names/data type mismatch”
The following example clearly explains the logic:
Few useful Tips:
Consider Source Table “S”:
EMP_ID
EMP_NAME
DEPT_ID
SAL
Consider Target “T1”
TGT_EMP_ID
TGT_EMP_NAME
TGT_ DEPT_ID
TGT_SAL
Consider Target “T2”
TGT_EMP_ID
TGT_DEPT_ID
TGT_SAL
T1 and T2 must be having SUM (SAL) grouped on TGT_ DEPT_ID.
Before T1 an aggregator has to be placed to obtain SUM (SAL) grouped on TGT_ DEPT_ID.
But
the same aggregator cannot be used in T2, since the number of columns
in T1 and T2 are different. So a separate aggregator must be placed
before T2 to obtain SUM (SAL) grouped on TGT_ DEPT_ID.
Look up can be made on either SOURCE/ TARGET.
There are 2 types of Look ups:
Unconnected
lookups are used when there are many inputs, many lookup columns and
only one output column is required form the lookup.
“I” stands for Input Port, “O” stands for Output Port, “L” stands for Lookup Port and “R” stands for Return Port
1) Input ports only “I” should be checked
2) “L” should be checked for those ports on which we are performing lookup.
3) Return
Port is the Output column that is the result of the lookup
transformation. So for this column “O”,”L” and “R” will be checked.
4) For unconnected lookups, in the expression we should create an output port and pass the input parameters.
5) The following syntax should be given in the output port:
:LKP.<lookup_name> (PORT1, PORT2, PORT3)
Where PORT1, PORT2, PORT3 are the input ports in the lookup.
Note: The order of Ports mention in the lookup formula should be same as that in the lookup transformation.
For example:
:LKP.LKP_LOCATION (LOCATION_ID, LOCATION_CD, SYSDATE)
Note: The orders of ports which are given in the brackets are just as given in the above example.
6) Also the condition for the lookup must be specified in the “Condition” tab.
Here in this example, the following are the lookup conditions:
LOCATION_ID = i_LOCATION_ID,
LOCATION_CD = i_LOCATION_CD,
START_DT = i_DATE
7) In
“Properties” tab of the lookup transformation, we have the option for
handling multiple caches i.e., when multiple rows are displayed for the
lookup condition. The following are the options available:
>> Return the first matching value
>>Return last matching value
>>Return any matching value
>>Report Error
Using
the above options we can determine what happens when the Lookup
transformation finds multiple rows that match the lookup condition.
We
can select “Return First / Last matching value” from the cache or
lookup source, or “Report Error”. Or, we can allow the Lookup
transformation to use any value. “Return the first matching value” and
“Report Error” are preferred for better performance.
When
we configure the Lookup transformation to return any matching value,
the transformation returns the first value that matches the lookup
condition. It creates an index based on the key ports rather than all
Lookup transformation ports.
2) Connected Lookup:
Incase if we need to take out more than one output ports, by doing a lookup on a table we use connected lookup.
“I” stands for Input Port, “O” stands for Output Port, “L” stands for Lookup Port and “R” stands for Return Port
1) Input ports only “I” should be checked
2) “L” should be checked for those ports on which we are performing lookup.
3) For Output columns that are the result of the lookup transformation, “O” and “L” will be checked.
Note: Here the return port will be unchecked.
1.
It is always preferable to place the conditions with equality sign (=)
first in the list of conditions because it improves the Performance.
2. If we have privilege to modify the database, adding the index to the columns improves the performance of the look up.
It is used to filter the records incase if we need only particular records in the Target that may be either Table/Flat File.
The
following example shows only if the Target table / flat file should
contain only the CUSTOMER_NAME as ‘JOY’ and 3 weeks of data from the
Starting date.
The condition should be given as:
CUSTOMER_NAME='JOY’
AND LOAD_DATE<=ADD_TO_DATE (MIN_START_DATE,'DD', 14)
Whenever
there is need to direct the values from various sources to different
targets, where the targets are to be loaded based on some conditions,
then router transformation is used.
Example1: There is a Customer Details table with:
Customer_Id,
Customer_Name,
Customer_Location,
Customer_Order_Qty
And
there are multiple targets with same definition i.e., Customer_Name and
Customer_Order_Qty but one target table for each Customer_Location say
one for Chicago, one for Plano and one for Texas.
Then we need to use router transformation to route values based on location names.
There are two types of output groups:
User-defined groups: Here in this example:
Group 1 = Customer_Location = ‘Chicago’
Group 2 = Customer_Location = ‘Plano’
Group 3 = Customer_Location = ‘Texas’
Default group: The
Designer creates the default group after you create one new
user-defined group. The Designer does not allow you to edit or delete
the default group. This group does not have a group filter condition
associated with it. If all of the conditions evaluate to FALSE,
the Integration Service passes the row to the default group. Here in
this example, all Customer details apart from ‘Chicago’, ’Plano’ and
‘Texas’ will fall under Default group.
Example2:
If we have to route the Customer_Id (null or 0) to reject file and Customer_Id >0 to some other target file.
i) We need to create a group in the router and place the below condition under one group.
(ISNULL (CUSTOMER_ID)) or (CUSTOMER_ID = 0)
ii) Also create another group and place the below condition for GOOD records:
CUSTOMER_ID>0
iii) At last take the ports from the particular GROUP to which ever target is needed.
The
expression can be used for mathematical calculations and also used in
case we use unconnected lookups in our mappings to refer to the lookup
and fetch the return value from that lookup.
The expression transformation is passive and connected transformation and is most commonly used and effective transformation.
The following are the few syntax / examples for reference:
i) Expression Transformation is used for comparision by specifying the conditions.
IIF ((v_PRODUCT_ID> 1 AND v_PRODUCT_ID < 500 ), 7.5,ROUND(v_PRODUCT))
ii) It can be used for converting a number to character or vise versa
Conversion: TO_CHAR (variable name or column name)
iii) It can be used for concatenation of two or more columns.
Column1||Column2|| Column 3|| Column4
iv)
If there are null values for few columns and these values needs to be
assigned to some values in such cases expression transformation can be
used.
iif
(isnull (variablename1 or columnname1) or variablename1 (or) column
name1 = 0, iif( isnull(variablename2 or column name2) or variablename2
or column name2= 0 , 0 , variablename2 or column name2), variablename1
or column name1)
v)
The expression can be used as a configuration option for the output
port. The return value for the output port needs to match the return
value of the expression
iif (column name = some value,:LKP.LOOKUPNAME(PORT1, PORT2,PORT3,etc.), 0)
vi) Date function: ADD_TO_DATE (columnname,’DD’, number of days)
JOINER TRANSFORMATION:
Joiner transformation is active and connected transformation.
It
can be used to join source data from two related heterogeneous sources
residing in different locations or file systems whereas the Source.
Qualifier is used to join from homogeneous sources. It can be also used for joining the data from the same sources.
It is used to join based on the certain conditions with at least one matching column.
In
SQL, a join is a relational operator that combines data from multiple
tables into a single result set. The Joiner transformation is similar to
an SQL join except that data can originate from different types of
sources.
We
can define the join type on the Properties tab in the transformation.
The Joiner transformation supports the following types of joins:
If
a result set includes fields that do not contain data in either of the
sources, the Joiner transformation populates the empty fields with null
values. If we know that a field will return a NULL and we do not want to
insert NULLs in the target, we can set a default value on the Ports tab
for the corresponding port.
Rules:
1) All the ports that needs to be joined should be brought to the joiner.
2)
There are two pipelines namely master and detail pipeline. We should
take out the ports from the detail pipeline till the target and master
pipeline normally ends up in joiner, but in case of equiv join we can
take out from any ports.
3) Joiner Transformation cannot be used when either of the input pipelines contains update strategy.
4) It cannot be used when the sequence transformation is connected before the joiner.
Important Points to be noted:
i) Joiner can be used or the connected lookup can be used both does the same function based on the requirements.
ii) Performance-wise,
Joiner takes more time than Lookup, so Lookup is preferred while we
have to retrieve data from two tables based on some join conditions.
iii) We
can improve session performance by configuring the Joiner
transformation to use sorted input. We can see the performance will be
improved when we are working with the large set of data.
iv) Performing
a join in a database is faster than performing a join in the session,
by providing the join condition in the Source Qualifier improves the
performance.
To
configure a mapping to use sorted data, we should establish and
maintain a sort order in the mapping, when it processes the Joiner
transformation. We need to follow the below tasks to configure the
mapping:
Configure the sort order. Configure
the sort order of the data we want to join. We can join sorted flat
files, or we can sort relational data using a Source Qualifier
transformation. We can also use a Sorter transformation before the
joiner transformation.
Configure the Joiner transformation.
Configure the Joiner transformation to use sorted data and configure
the join condition to use the sort origin ports. The sort origin
represents the source of the sorted data.
This is used for creating the sessions and associate the sessions to the workflow.
In properties tab:
Session Log File Directory: <Directory name (UNIX)>\SessLogs
If it is performing INSERT operation then select “Treat source row as INSERT”
In config object tab:
It is always preferred to give Stop on errors should be given as 1.
1) Click on Sources, then in connection give the connection name if it is already created.
2) Click on the particular connection and check whether the same is used for connecting the Database.
3) Click on the target and set the connection for it.
4) In the properties which is at the bottom,
5) Target load type should be “Normal”.
6) Truncate
table option should be checked if the target has to be truncated and to
avoid error (Violating unique key constraint) while executing.
7) Reject file directory should be given as Directoryname(UNIX)\BadFiles
8) For the remaining transformation it should be cache directory Directoryname (UNIX)\Cache
Note: If the new connection has to be created for workflow:
Go to Workflow manager à connections à Relational then give the appropriate information. (User, password, server)
Go to Workflow à create à mention the name of the workflow
Below that Integration service should be selected as mentioned by the client.
Next in the workflow properties Directoryname (UNIX)\WorkflowLogs\ should be given.
Fig: Snap-shot of “Edit properties” of session
If the Source / Target used in the mapping is a Relational table then, following session configurations must be done:
Select
the particular source / target from the left panel. If the selected
source / target is a relational table then “Relational Reader” needs to
be selected in top most drop down in the snap-shot shown above. (i.e.,
in the “Mapping” tab of “Edit Task” of the session). Once this is done,
in the “Connections” heading of the snap-shot shown, the Database
connection need to be selected.
Similarly
for “Flat File” source / target “File-writer” needs to be selected in
the top most drop down in the snap-shot shown above.
In
the above snap shot “File Writer” is mentioned in the top most drop
down. That signifies the target selected in the left panel i.e.,
highlighted is a flat file. For Flat file source / target additional
session properties like:
Source / Output File Directory must be mentioned – this can be file path in Unix box,
Source / Output Filename must be mentioned.
Incase if the target is the flat file we need to go to UNIX box and check whether it is loaded properly.
Steps:
Run à Telnet server name (as mentioned by client)
Telnet server name
Login: As mentioned by client
Password: As mentioned by client
Go to the specified directory and give cd <Directoryname> then the files will be listed.
After this give cat <filename> the records will be listed or vi <filename>
To check the status of the mapping being executed.
Right
Click on the session to get the session log which is very useful in
case the session is failed. Also click on the run properties to know the
number of records loaded into the target and number of records fetched
from the source.
1 comment:
really good piece of information, I had come to know about your site from my friend shubodh, kolkatta,i have read atleast nine posts of yours by now, and let me tell you, your site gives the best and the most interesting information. This is just the kind of information that i had been looking for, i'm already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a lot once again, Regards,informatica online training
Post a Comment