ETL Tools

Extract, transform & load tools

 

Two types (i) off-the-shelf (ii) customized

Most organization prefers off-the-shelf and then customize rather than in-house development

 

INFORMATICA PowerCenter   ETL tool roughly around 2-3 lacs

 

Extraction complexities    extract from various departmental databases in different format e.g., flat files, relational, ISAM etc,  bring to a consistent format often through Data Staging, a temporary database for conversion

 

Transformation 

 

The transform stage applies a series of rules or functions to the extracted data to derive the data to be loaded. Some data sources will require very little manipulation of data. In other cases, one or more of the following transformations types may be required:

 

 

 

 

 

 

 

 

 

Src: Wikipedia

Load

The load phase loads the data into the data warehouse (DW).

 Depending on the requirements of the organization, this process ranges widely. Some data warehouses might weekly overwrite existing information with cumulative, updated data, while other DW (or even other parts of the same DW) might add new data hourly.

The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data.

Challenges

ETL processes can be quite complex, and significant operational problems can occur with improperly designed ETL systems.

The range of data values or data quality in an operational system may be outside the expectations of designers at the time validation and transformation rules are specified. Data profiling of a source during data analysis is recommended to identify the data conditions that will need to be managed by transform rules specifications.

The scalability of an ETL system across the lifetime of its usage needs to be established during analysis. This includes understanding the volumes of data that will have to be processed within Service Level Agreements, (SLAs). The time available to extract from source systems may change, which may mean the same amount of data may have to be processed in less time. Some ETL systems have to scale to process terabytes of data to update data warehouses with tens of terabytes of data. Increasing volumes of data may require designs that can scale from daily batch to intra-day micro-batch to integration with message queues for continuous transformation and update.

Src: Wikipedia

 

Hosted by www.Geocities.ws

1