Data warehousing is a valuable, proven approach to providing business users at all levels of an organization with the information they need to make high-impact decisions - only if the right technologies, architecture, and techniques are used (Simon, Alan). In essence, a data warehouse is managed data beyond the every day operational system structure. It consists of business related data obtained from various sources and then formatted to meet business decision-making. The data warehouse technology facilitates informed business decisions by providing a means for the collection, consolidation and organization of data for reports and analysis.
The key of data warehousing is that data stored for business purposes or for analysis can be more efficiently used if stored outside the operational system. In the early stages of the technology, systems archived data on magnetic tape, as it became old or inactive. Accessing �old� or archived information involved a manual process of loading the tapes. However, factors such as relational database management systems, development of faster processors and greater storage capacities, as well as enhancements in end-user applications has made it possible for data warehousing to exist.
Logical Transformation of Operational Data
There are some logical transformation procedures involved in moving data from an operational system to a data warehouse. The model of the data warehouse model outlines the logical and physical structure of the data warehouse (Gupta, Vivek). The model needs to be independent of the relational data model that exists for the operational system. Data in the data warehouse is, functionally, de-normalized, and may have redundant reference data. If data obtained from different systems have varying information about the same product or process (according to the business), the data warehouse must consolidate all information provided for that product that is relevant to the business process.
The de-normalization of data reduces the needs for joins in an SQL query. Normalization is used in relational modeling and involves breaking the database into various tables for flexibility purposes, however, this can make the data model very complex. A data warehouse model incorporates a de-normalized structure mainly for "performance and simplicity" (Gupta, Vivek); see Figure 2.
Data Architecture
The type of architecture deemed most efficient in creating a data warehouse is considered the three-level architecture. The conventional architecture, or two-level architecture, see Figure 3, shows all data being combined into one centralized unit, which can then be accessed by users. The three-level architecture allows data stored in a data warehouse to be customized using data mart technology that provides single-subject data to a small group of people that need that specific data. Decision support, in essence is customized for various groups, and only the specific data needed is given to a specific group.
The Dimensional Model - Star Schema
The dimension model must be created taking into account the business needs and detailed needs of the users. The model should provide ease of use and address the user requirements. The dimensional model uses a star design that relates to business needs and supports simple queries. Other schemas exist such as the Snowflake schema, however the Star schema design seems to be well suited for a data warehouse technology.
The Star schema is a data modeling technique used to map multidimensional decision support data in a relational database. The Star schema was developed because the existing relational modeling techniques was not structured to facilitate advanced data analysis requirements. Star schemas allow an easily constructed model for multidimensional data analysis while still preserving the relational structures on which the operational database is built. The basic Star schema has four components including facts, dimensions, attributes, and attribute hierarchies. The reason it the technology is called a Star schema is because all of the dimension tables have a many to one relationship with the fact table. The fact table contains facts that are linked through their dimensions. Some facts that are commonly used in business data analysis are units, costs, prices and revenues. Dimensions are qualifying characteristics that provide additional perspectives to a given fact.
For example, a sale from a certain item may be compared by region, and by time period. Attributes are sometimes used to search, or for the classification of facts. Dimensions provide descriptions about facts by the attributes. The data warehouse must be designed to define common business attributes that will be used by in a narrow search by a data analyst. Finally, attribute hierarchies provide a top-down data organization that may be used for two main functions: aggregation and drill-down/roll-up data analysis