ETL Tools
Extract, transform & load
tools
Two types (i) off-the-shelf
(ii) customized
Most organization prefers
off-the-shelf and then customize rather than in-house development
INFORMATICA PowerCenter ETL tool roughly around 2-3 lacs
Extraction complexities extract from
various departmental databases in different format e.g., flat files,
relational, ISAM etc, bring to a
consistent format often through Data Staging, a temporary database for
conversion
Transformation
The transform
stage applies a series of rules or functions to the extracted data to derive
the data to be loaded. Some data sources will require very little manipulation
of data. In other cases, one or more of the following transformations types may
be required:
Src:
Wikipedia
The load phase
loads the data into the data warehouse (DW).
Depending on the requirements of the
organization, this process ranges widely. Some data warehouses might weekly
overwrite existing information with cumulative, updated data, while other DW
(or even other parts of the same DW) might add new data hourly.
The timing and
scope to replace or append are strategic design choices dependent on the time
available and the business needs. More complex systems can maintain a history
and audit trail of all changes to the data.
ETL
processes can be quite complex, and significant operational problems can occur
with improperly designed ETL systems.
The
range of data values or data quality in an operational system may be outside
the expectations of designers at the time validation and transformation rules
are specified. Data profiling of a source during data analysis
is recommended to identify the data conditions that will need to be managed by
transform rules specifications.
The scalability
of an ETL system across the lifetime of its usage needs to be established
during analysis. This includes understanding the volumes of data that will have
to be processed within Service Level Agreements, (SLAs). The time
available to extract from source systems may change, which may mean the same
amount of data may have to be processed in less time. Some ETL systems have to
scale to process terabytes of data to update data warehouses with tens of
terabytes of data. Increasing volumes of data may require designs that can
scale from daily batch to intra-day micro-batch to integration with
message
queues for continuous transformation and update.
Src:
Wikipedia