Informatica ETL process implementation
Parameters and variables in Informatica PowerCenter
Purpose
A mapping can utilize parameters and variables to store information during the execution. Each parameter and variable is defined with a specific data type and their main purpose is to provide increased development flexibility.
Parameters are different from variables in the fact that:
Parameters are different from variables in the fact that:
Both parameters and variables can be accessed from any component in the mapping which supports it.
To create a parameter or variable, go to Mapping -> Parameters and Variables from within the Mapping Designer in the Designer client.
The format is $$VariableName or $$ParameterName
Changing values of Variables
To change the value of a variable, one of the following functions can be used within an expression: SETMAXVARIABLE($$Variable, value) , SETMINVARIABLE($$Variable, value), SETVARIABLE($$Variable, value) , SETCOUNTVARIABLE($$Variable), where:
At the end of a successful session, the values of variables are saved to the repository. The SetVariable function writes the final value of a variable to the repository based on the Aggregation Type selected when the variable was defined.
Parameter files
Parameter file is an ASCII file which is used to set values of mapping paramteres and variables. Parameters can be set on workflow, worklet or session level. The physical location of a parameter file is set in Workflow manager in Workflows -> Edit. It can also be specified using the pmcmd command when starting a session task.
Parameter file structure
Parameters can be grouped into the following sections:
- [Global]
- [Service: service name]
- [folder name.WF:workflow name]
- [folder name.WF:workflow name.WT:worklet name]
- [folder name.WF:workflow name.WT:worklet name.WT:worklet name...]
- [folder name.WF:workflow name.ST:session name]
- [folder name.session name]
- [session name]
Examples / useful tips
- The value is initialized by the specification that defines it, however it can be set to a different value in a parameter file, specified for the session task
- Initialization priority of Parameters: Parameter file, Declared initial value, Default value
- Initialization priority of Variables: Parameter file, Repository value, Declared initial value, Default value
- Parameters and variables can only be utilized inside of the object that they are created in.
- Parameters and variables can be used in pre and post-SQL
- Sample parameter file:
[Service:IntegrationSvc_01] $$SuccessEmail=dwhadmin@etl-tools.info $$FailureEmail=helpdesk@etl-tools.info [DWH_PROJECT.WF:wkf_daily_loading] $$platform=hpux $$DBC_ORA=oracle_dwh [DWH_PROJECT.WF:wkf_daily_loading.ST:s_src_sa_sapbw] $$DBC_SAP=sapbw.etl-tools.info $$DBC_ORA=oracle_sap_staging
Workflow tasks
Purpose
Tasks are designed in Workflow Manager and represent an executable set of actions, functions or commands - which define the runtime behavior of thw whole ETL process.
Some of the most often used tasks: - Session task - it needed to run a mapping
- Assignment task - can establish the value of a Workflow Variable whose value can be used at a later point in the workflow, as testing criteria to determine if (or when) other workflow tasks/pipelines should be run.
- Decision task - enables the workflow designer to set criteria by which the workflow will or will not proceed to the next set of tasks, depending on whether the set criteria is true or false. It tests for a condition during the workflow and sets a flag based on the condition
- Email task - sends an email from within a workflow. Email addresses, a subject line and the email message text can be defined. When called from within a Session task, the message text can contain variable session-related metadata.
- Event wait task - pauses processing of the pipeline until a specified event occurs. Events can be Pre-defined (file watch) or User-defined (created by an Event Raise task elsewhere in the workflow)
- Command task - specifies one or more UNIX command or shell script, DOS command or batch file for Windows servers to run during a workflow. Command task status (success or failure) is held in the task-specific variable: $command_task_name.STATUS.
UNIX and DOS commands can also be run pre- or post- session in a Session task.Examples / useful tips
- Session, Email and Command tasks can be reusable.
- Reusable tasks are created within the Task Developer
Workflow
Purpose
Workflow, designed in Workflow Manager, is a collection of tasks that descibe runtime ETL processes. Speaking the IBM Infosphere Datastage language, Worflows are Job Sequences, Flows in Ab Initio and Jobs in Pentaho Data Integration.Examples / useful tips
- Use a parameter file to define the values for parameters and variables used in a workflow, worklet, mapping, or session. A parameter file can be created with any text editor such as Notepad or Vi.
- When developing a sequential workflow, it is a good idea to use the Workflow Wizard to create Sessions in sequence. Dependencies between the sessions can be created.
- Session parameters must be defined in a parameter file. Since session parameters do not have default values, when the Integration Service cannot locate the value of a session parameter in the parameter hie, the session initialization fails.
- On under-utilized hardware systems, it may be possible to improve performance by processing partitioned data sets in parallel in multiple threads of the same session instance running on the Integration Service node. However, parallel execution may impair performance on over-utilized systems or systems with smaller I/O capacity
- Incremental aggregation is useful for applying captured changes in the source to aggregate calculations in a session.
- From the Workflow Manager Tools menu, select Options and select the option to 'Show full names of task'. This will show the entire name of all tasks in the workflow.
Design and development best practices
Mapping design tips
- Standards - sticking to consistent standards is beneficial in a long-run. This includes naming conventions, descriptions, environment settings, parameter files, documentation, etc.
- Reusability - in order to be able to react quickly to potential changes, use where possible such Informatica components as mapplets, worklets, reusable transformations.
- Scalability - when designing and developing mappings, it is a good practice to keep volumes in mind. This is caching, queries, partitioning, initial vs incremental loads.
- Simplicity - it is recommended to create multiple mappings instead of few complex ones. Use Staging Area and try to keep the processing logic as clear and simple as possible.
- Modularity - use modular design technique (common error handling, reprocessing).
Mapping development best practices
- Source Qualifier - use shortcuts, extract only the necessary data, limit read of columns and rows on source. Try to use the default query options (User Defined Join, Filter) instead of using SQL Query override which may impact database resources and make unable to use partitioning and push-down.
- Expressions - use local variables to limit the amount of redundant calculations, avoid datatype conversions, reduce invoking external scripts (coding outside of Informatica), provide comments, use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally faster than string operations.
- Filter - use the Filter transformation as close to the source as possible. If multiple filters need to be applied, usually it's more efficient to replace them with Router.
- Aggregator - use sorted input, also use as early (close to the source) as possible and filter the data before aggregating.
- Joiner - try to join the data in Source Qualifier if possible, avoid outer joins. It is a good practice to use a source with fewer rows as a Master source.
- Lookup - relational lookup should only return ports that meet the condition. Call Unconnected Lookup in expression (IIF). Replace large lookup tables with joins when possible. Review the database objects and add indexes to database columns when possible. Use Cache Calculator in session to eliminate paging in lookup cache.
No comments:
Post a Comment