DW Toolkit

Notes from the Data Warehouse Toolkit

by Ralph Kimball

This is a set of notes I made while reading the book "The Data Warehouse Toolkit" by Ralph Kimball.

Goals of the Data Warehouse

The data warehouse provides access to corporate or organizational data.
The data must be easily accessible, and reports can be run and rerun easily and results returned quickly.
The data in a data warehouse is consistent.
Two different users who make the same query should get identical results even if they make requests at different times. Also if the data load is incomplete for the request the user is informed about this.
The data in the data warehouse can be sliced and diced along every dimension i.e. by every possible measure in the business.
The data warehouse is not just data, but also a set of tools to query, analyze and present information.
The DW encompasses both back-end database infrastructure, as well as front-end reporting tools.
The data warehouse is where we publish used data.
Data is assembled from a variety of sources, cleaned up, quality assured and released only when ready for use.
The quality of data in the data warehouse is a driver of business re-engineering.

I. Two Different Worlds

OLTP is profoundly different from dimensional data warehousing DDW.

Consistency
OLTP consistency is microscopic. In OLTP consistency, ALL the transactions presented to the system are accounted for.
In a data warehouse, consistency is global. We care that the load of new data is a full and consistent load of data. In the case of OLTP we have a microscopic perspective. In the case of DDW, we have a quality assurance perspective.
Does the data load pass the consistency test? This is what we ask in DW
An OLTP system processes thousands or millions of transactions per day. A DW system processes a single transaction each day called the production data load.
Users and managers of an OLTP system
i. usually deal w/ one account at a time
ii. are making use of the system continuously throughout the day to make updates, queries, lookups of SINGLE records from the data. (That is why it is called on-line).
iii. are concerned about performance and reliability for every transaction.
iv. make transactions for small (almost atomic) sets of data.
v. OLTP users perform the same query multiple times.
Users and managers of a DW system
i. usually deal with more than one account at a time, but need information on a series of account across the organization.
ii. make use of the system periodically, e.g. morning reports from the previous night's load.
iii. are more concerned with the performance of single-table queries, which are expected to be instantaneous. Join queries can be allowed to run for seconds or minutes.
iv. usually make queries that summarize a large set of data for reporting.
Time
OLTP databases are twinkling databases, i.e. they are in flux and their state is constantly changing, . Thus there is temporal inconsistency wrt state. OLTP databases lack explicit support for historical data, e.g. a financial securities database system which contains a snapshot of trading positions as they change throughout the day. This is also temporal inconsistency.
DW systems contain time-series data. The data is generally not in flux, except at load time, and even then inserts, rather than updates are generally being done. Hence individual records, once added, are static rather than dynamic. By storing static snapshots of the data at periodic time intervals, we are able to make historical queries of the data. The snapshot is called the production data extract.
Data model
OLTP systems use an entity-relationship data model. This model consists of a set of tables of normalized data which are connected to each other in many possible ways. Queries to an entity relationship model usually involve joins between pairs of tables. Entity relationship data models are inappropriate for data warehouses since they cannot be understood by users or navigated adequately by DW software - a query involving joins between many large tables would perform terribly.

Kimball outlines a problem in data warehouse projects with overly complex schemas: $3 million hardware & software doing 50 queries per day, an IT dept. that has to write queries in hand-coded SQL,a marketing dept. that is unhappy because they can't use the system directly and can't ask in which areas of the business they are profitable, etc..

DW systems use a dimensional data model/star join schema. This model is asymmetrical, with a central fact table surrounded by multiple lookup/dimension tables.

Fact tables
Each item in the fact table represents the grain of the fact table.
The fact table is where the numerical measurements/metrics of the business are stored. The most useful facts are numeric, continuously valued and additive. The data should be continuously valued since it will in general be impossible to predict the set of values it can take on.

Dimension tables
Dimension tables are where the description of the data is stored. The best attributes are textual, discrete and are the source of the constraints and row headers in the result set. It is often unclear whether a numeric data field extracted from a production data source is a fact or a a dimensional attribute. We can tell the difference by asking the question "is the data field variable and continuously valued, or is it a discrete value of something that stays fairly constant.

Standard Template Query
select d.rowheader,sum(f.metric) : select list
from fact f,dimension d,time t : from clause
where f.dimensionkey=d.dimensionkey : join constraint
and f.timekey=t.timekey : join constraint
and t.date='11/11/97' : application constraint
group by p.rowheader : group by clause
order by p.rowheader : order by clause
The select list defines the columns that will appear in the result set. There is a row header and an aggregated fact (sum) in the select list.
The from clause is a list of the tables used in the query.
The join constraints capture the relationship between the fact and dimension tables. They link the tables by means of primary keys and foreign keys. The primary key forms the index of the dimension table. The foreign key is used to lookup into the fact table. Every foreign key from the fact table must have its counterpart in the dimension table in order to preserve referential integrity.
The fact table has a composite primary key, meaning that it is a combination of multiple foreign keys. Every fact table has a composite key, and conversely, every table that has a composite key is a fact table. The joins in a dimensional data warehouse capture a fundamental relationship between entities in the business model.
The application constraint serves to limit the result to the subset of the facts desired from the fact table.
The group by clause shows how to summarize the results in the result set, i.e. by row headers.
The order by clause expresses the sort order of the result set for display to the user.
Attributes
The database is only as good as the dimension tables as the attributes in the dimension tables provide the constraints that limit the result set to a useful and meaningful subset of the universe of facts.

OLAP - On-line Analytical Processing

II. The Grocery Store

Steps in the Design Process

Choose a business process to model.
Choose the grain of the business process.
Choose the dimensions that will apply to each fact table record.
Choose the metrics to be stored in the fact table.

Identifying the Processes to Model

The first step in the design process is to decide what business processes to model, by combining an understanding of the business with an understanding of what data is available.
The second step in the design is to decide on the grain of the fact table in each business process.
A data warehouse almost always demands data expressed at the lowest possible rain of each dimension, because queries need to cut through the database in very precise ways.
A careful grain statement determines the primary dimensionality of the fact table. If it is recognized that an additional desired dimension violates the grain by causing additional records to be generates, e.g. the fact table records items per day, but we wish to add an hour attribute to the Time dimension table.

Normalization

The fact table in a dimensional schema is naturally highly normalized
The dimension tables must not be normalized but should remain as flat tables. Normalized dimension tables destroy the ability to browse.

Time Dimension

Most data warehouses need an explicit time dimension even though the primary time key may be an SQL date-valued object. The explicit time dimension is needed to describe fiscal periods, seasons, holidays, weekends and other calendar calculations that are difficult to get from the SQL date machinery.

Product Dimension

The product dimension is one of the two or three primary dimensions in nearly every data warehouse.
Drilling down in a data warehouse is nothing more than adding new headers from the dimension tables. Drilling up is subtracting row headers.

Grocery Store/Page Turn facts

Just as customer counts are semi-additive across the sales fact table due to the fact that we summarized up from the individual transactions to the daily item movement totals (e.g. if we have 25 sales of cornflakes and 30 of rice krispies we cannot tell how many customers bought either cornflakes or rice krispies) so are visitor counts. This is why the Visit dimension is needed to keep track of individual visitors.

Visitor/Customer counts are semi-additive when they occur in time snapshot fact tables because they double count activity across products during the visit/customer event.

III. The Warehouse

Degenerate dimensions are dimension keys with no corresponding dimension table.

IV. Shipments

Ship-To Dimension

Any dimension whose records define a point in space automatically is capable of supporting multiple independent geographic hierarchies.
It is natural and common, for a dimension to simultaneously support multiple independent hierarchies. Drilling up and down within each of these hierarchies must be supported in a data warehouse.
Two loosely correlated attributes that have a many-to-many relationship can be modeled either as a single compound dimension, or they can be modeled as separate dimensions, at the designer's discretion.
If attributes are highly correlated (i.e. one-to-many) one should use a single dimension, since the two attributes can then be browsed against each other more efficiently than if they intersect only in the fact tabs, and the total no. of keys in the single dimension will only be a little larger than the cardinality of the more numerous attribute. If the attributes are very loosely correlated or have zero correlation (many-to-many), then it is best to model them as separate dimensions, because otherwise the no. of records in a combined dimension will probably be the product of the no.s of the separate attributes.

VI. Big Dimensions

Drill-down
Drill-down simply means "show more information"/add a new row header to the report.

Resist the tendency to snowflake
Reasons NOT to snowflake:

The space savings resulting from snowflaking are minute compared to the size of the overall fact table and the database as a whole in most cases.
If you snowflake your dimensions, be prepared to live with poor browsing performance.

MiniDimensions

The best approach for tracking changes in really huge dimensions is to break off one or more minidimensions from the dimension table, each consisting of small clumps of attributes that have been administered to have a limited no. of values.
The minidimension key can exist as a foreign key in both the fact table and the dimension table from which it was broken off.

Slowly Changing Dimensions

How do we track changes over time in dimension tables, e.g. product names change, addresses change etc.? This emphasizes the fact that primary dimensions are not completely time independent.

We have the following 3 fundamental choices to make when we encounter a slowly changing dimension:

Overwrite old values with new ones and hence lose the ability to provide a history over time.
This is very rarely of value, except when the data was incorrect in the first place.
Create a new dimension record at the time of the change with the new attribute values, thereby segmenting history very accurately between the old description and the new description.
This requires that a new dimension key be generated for this new attribute value.
- The use of the Type 2 slowly changing dimension requires that the dimension key be generalized. It may be sufficient to take the underlying production key and add two or three version digits to the end of the key to simplify the key generation process.
- The Type 2 slowly changing dimension automatically partitions history and an application must not be required to place any time constraints on effective dates in the dimension. The reason for this is that we are always looking up the attribute from a row in the fact table which is already dated.
- Since the Type 2 slowly changing dimension partitions history, we will not be able to use the new value of a changed attribute on old history or vice versa. E.g. if we constrain on Label=Active from Oct 15, 1997 we will to be able to see that label before this date. If we wish to see it "if it had been this way all the time, i.e. Active" we have to resort to Type 3.
Creating new "current" fields within the original dimension to record the new attribute values, while keeping the original attribute values as well, thereby being able to describe history both forward and backward from the change either in terms of the original attribute values or in terms of the current attribute values.
The Type 3 slowly changing dimensions is equipped to handle only the original and current values of the changed attribute. Intermediate values are lost. Thus we cannot partition history as in the Type 2 model.

Slowly Changing MiniDimensions

The attributes isolated in a minidimension are very often those attributes that change over time and whose distinct values need to be tracked. In those cases where all possible combinations of the attribute values have already been created, updating a description in the deriving minidimension (customer) means placing a different key in the fact table record from a certain time forward and overwriting the derived minidimension key (demographics key) in the deriving table record (customer record) whenever it changes.

VII. Financial Services

In data warehouses where a dimension must describe a large no. of heterogeneous items, the recommended technique is to create a core fact table and a core dimension table in order to allow queries to cross the disparate types, and to create a custom fact table and a custom dimension table for querying each individual type in depth.
All the keys are duplicated exactly once in the custom dimension tables.
The primary core facts should be duplicated in the custom fact tables. This virtually eliminates the need to process two fact tables in a single query in a heterogeneous product schema

X. Factless Fact Tables

Factless fact tables are used to track events for which there are no measured facts. There are two major types: event tracking tables and coverage tables.

Event Tracking Tables

These are used for recording events which occur as the culmination of a number of dimensional entities, e.g. daily attendance at a college. There are no facts per se. The "facts" only occur when each of the dimensional entities occur/exist at a point in time. The purpose of the fact table is to register the event of the occurrence of the entities at that point in time.

Events are often modeled by a fact table containing a number of keys, each representing a participating dimension in the event. Such event tables often have no obvious numerical facts associated with them, and hence are called factless fact tables.

Coverage Tables

Coverage tables are often tables of events that didn't happen. Coverage tables are usually factless in the same way as event tracking tables.

XII. Building a Dimensional Data Warehouse

The 9 decision points to be decided when designing a dimensional data warehouse are:

The processes, and hence the identity of the fact tables.
The grain of each fact table.
The dimensions of each fact table.
The facts, incl. pre-calculated facts.
The dimension attributes with complete descriptions and proper terminology.
How to track slowly changing dimensions.
The aggregations, heterogeneous dimensions, minidimensions, query modes and other physical storage decisions.
The historical duration of the database.
The urgency with which data is extracted and loaded into the data warehouse.

These decisions should be made in the order given.

Interviewing the End User

Interviewing the end user is the most important first step in designing a data warehouse. The interview achieves the following:

It gives the designers an idea as to the needs and expectations of the users
It serves a PR purpose: i.e. it allows the designer to raise the awareness level of the forthcoming data warehouse among end users, and to adjust and correct some of the user's expectations. Hence we are managing user's expectations.

The interviews should be as thorough and encompass as many of the business units of the company as much as possible.

XIII. Aggregates

The use of pre-stored aggregates (summaries) is the single most effective tool the data warehouse designer has to control performance.
The dba should not be wasting time trying to get the DBMS optimizer to work "properly", rather the dba should be evaluating the need for building more aggregates based upon the behavior of the users. Another benefit of aggregates is that they can be guaranteed to be correct.

An aggregate is a fact table record representing a summarization of base-level fact table records. An aggregate fact table record is always associated with one or more aggregate dimension table records.
Question: where do we put the new fact table records and dimension table records and how are they to be administered? There are 2 main approaches:

New Fact Tables
New Level Fields

New Fact Tables

Aggregate fact tables are derivative fact tables since they are derived from the main base-level fact table. Each derivative fact table must be joined to one nor more derivative dimension tables. This derivative dimension table is a small subset of the larger dimension table where the subset only consists of those attributes relevant to the derived dimension. Entirely new keys my be created for this derived dimension. These keys did not exist in the base-level dimension table.

The creation of aggregate fact table records always requires the creation of artificial keys in each of the dimensions being aggregated.

Any dimension attribute that survives in the aggregate dimension table can be used more efficiently in the aggregate schema than in the base-level schema because it is guaranteed to make sense at the aggregated level.

New Level Fields

Aggregates can also be stored by using Level fields in the affected dimension tables, thus leaving the aggregate fact records in the original fact table. The only difference in this approach is that all the new aggregate records are kept in the original dimension and fact tables. In the Level field model the dimension table is used with all its fields, with the addition of a Level field. The Level field describes the aggregation level of every record in the dimension table. The new aggregate records need to have keys that are compatible with and do not conflict with the original base-level keys in the dimension table.

If aggregates are represented in the original dimension and fact tables by means of the Level field construct, then every query ever presented to that schema must constrain the Level field to a single value of double counting will occur. Hence the constraint "and LEVEL=Category" must be present in the query.
Each type (grain) of aggregate should occupy its own fact table, and should be supported by the proper set of dimension tables containing only those dimensional attributes that are defined for that grain of aggregate.
The single most effective way to control an aggregation explosion, but still benefit from the value of aggregates is to make sure that each aggregate summarizes at least 10 and preferably 20 or more lower-level items.

Indexing Issues

A loose constraint, or no constraint in a base-level fact table is actually a tight constraint in an aggregate fact table.
Only one sort order on the master composite index on the fact table needs to be built, since other sort orders based on missing dimensional constraints should be handled by separate aggregation tables instead.

Application Issues

A problem with building aggregates for dimensional tables is that applications had to know of the presence of aggregates and specifically request for them in the SQL generated. viz: select ... from base_sales_fact... vs select ... from department_sales fact. This is called an applications discontinuity.

Aggregate Navigation

A new layer of software, the aggregate navigatorhas been inserted between the application and the DBMS in order to address the applications discontinuity problem. The role of the aggregate navigator is to intercept the end user's SQL and transform it so as to use the best available aggregate. The end user's query tools, report writers and applications all speak nothing but base-level SQL. Even the applications developer does not need to worry about the identity of aggregate tables.
All of the end user's clients make requests to the aggregate navigator. The end-user apps. only speak base-level SQL. The function of the aggregate navigator is to transform the user's base-level SQL into aggregate-aware SQL. The aggregate navigator then submits this SQL to the DBMS. The real DBMS returns the result set to the aggregate navigator which subsequently passes it to the end user.
The phenomenon of query tool vendors offering aggregate navigation is the wrong architecture.

An aggregate navigator is an essential component of a data warehouse because it insulates end user applications from the changing portfolio of aggregations, and allows the DBA to dynamically adjust the aggregations without having to roll over the applications base.

XIV. The Back End

Browse Queries
Browse queries usually touch a single attribute in a dimension table through an index.

Multitable Join QueriesMultitable join queries involve the fact table and dimension tables. The joining of multiple fact tabes is not recommended. In a multitable join query the dimensional tables are evaluated and a list of primary key values from each of the tables is generated. The separate dimensions combine to generate a set of composite keys in sorted order that is presented to the fact table's composite key index. The facts returned from the fact table are as a result of the matching between these two ordered sets of composite keys in the fastest possible time.

Loading Phase
Make good use of mirroring to ensure that the DBMS is up for almost 24-7.

Production Data Extraction
This should in general involve the following 11 steps:

Primary Extraction
Identifying the changed records
Generalizing keys for changing dimensions
Transforming into load record images
Migration from the legacy system to DDW system
Sorting and building aggregates
Generalizing keys for aggregates
Loading
Processing exceptions
Quality assurance
Publishing

XIV. The Front End

Client-Side Query Tools
A client side query tool performs the following 2 functions:

It sends SQL requests to the DBMS
It receives answers back from the DBMS.

The query tool architecture should include the following:

The use of SQL to make requests to the DBMS.
Use of a database independent connectivity interface, e.g.. ODBC.
Use of a network based aggregate navigator.
Reliance on a bottleneck style of communication with the DBMS, with relatively small SQL messages being sent to the DBMS and relatively small answer sets being received from the DBMS.

Single table browse queries account for around 80% of end user queries while the remaining 20% are multitable joins. The multitable joins account for a majority of the resources used by the user. The use of a larger number of simpler multitable joins rather than a smaller number of more complex multitable joins is recommended.

There are 3 kinds of activities users do when data is returned to the screen.:

Comparison
Presentation
Asking Why

In the trade-off between power and ease of use, ease of use must always come first. End users will not use a tool that seems too complicated. If the tool is perceived as complicated, either IS will have to use the tool on the end user's behalf, or the tool will not be used.
Simplicity is mostly a function of whether the user can get to the desired result in "one button click", where that button click is obvious from the user interface. If the user believes that the query tool is simple by this definition, then the query tool will probably be spontaneous and repeatedly used.
For the rank and file user, the data warehouse should consist of a library of template applications that run immediately on the user's desktop. These applications should have a limited set of user-selectable alternatives for setting new constraints and for picking new measures. These template applications are pre-canned, parameterized reports.

The data warehouse team should have a software shop with responsibility to create the pre-canned parameterized reports. This shop should:

Develop an overall plan for building applications from a set of basic templates which can later be built upon.
Build the template applications with the aim of having the applications work immediately on the user's desktop and can be changed with the minimum number of button clicks.
Build the template apps. with dimension browsing capability and customized Help screen capability. A glossary should be provided in every template.
Provide a bug reporting hotline which is highly responsive to help requests from users.

It is essential for a query tools operating against a dimensional data warehouse to be able to perform comparisons flexibly and immediately. A single row of an answer set should simultaneously be able to show comparisons over multiple time periods of different grains and comparisons over other dimensions, and compound comparisons across two or more dimensions. A query tool should have these comparison alternatives available in a pull down menu or its equivalent. The end user should never see the SQL that implements the comparisons.
Presentation is a separate activity from query and comparing, and the data warehouse owner should choose tools that allow answer sets to be transferred easily by the user into multiple presentation environments for different purposes.

XVI. Front End Applications

There is a fundamental mismatch between the power and depth of a individual SQL answer set and the power and depth of a business report. In most cases, a business report must be assembled from several answer sets. Any query tool that deals with answers to business questions that can be described as business reports must deal with the issue of multiple answer sets. Conversely a raw answer set is not a sufficient deliverable to the user.

There are 4 possible techniques for producing comparison columns in business reports:

Self join
Correlated subquery
Case statement
Separate query streams (Multipass queries)

Of these, 4. Multipass queries seems to be the most simplest to generate and most convenient. For more info. on multipass queries, see the article by Kimball on Features for Query Tools in DBMS magazine, Feb. 1997.

Technique	Advantage/Disadvantage.
Self-join	Disadvantage: Complicated SQL; not produced by commercial query tool; produces large no. of tables which trouble DBMS optimizer
Correlated subquery	Same as above
Case statement	Advantage: Optimizer does not get confused by the existence of false additional tabes, and SQL is more compact than previous cases. Disadvantage: Division by 0 is not well handled. The SQL is not compact/intentional and no query tool will generate such SQL. The CASE statement requires the main SQL query constraints be made loose which hampers performance on complex queries and makes it difficult to use aggregates effectively.
Multipass Queries	Advantage: Small, compact SQL which allows tight control on DBMS performance. Division by zero problem disappears. Separate queries are amenable to use of full aggregate navigation. Dynamic edits to the overall report can be handled by fetching only the changed column, rather than the entire report. Post-processing of the answer set. Disadvantage: Query tool must manage the separate query streams.

Comparisons are a key element of nearly every business report. Comparisons should be implemented as separate query streams, managed by the query tool.
DBMSs constructed on the fly to hold the results of separate query streams are not recommended. The separate query streams should be sent to the client query tool for all further processing.
A good user interface is based on recognizing and pointing, not remembering and typing. A good interface minimizes the no. of button clicks and context switches.
A report-writing query tool should communicate the context of the report at a glance, including esp. the identities of the dimension tables and fact tables, the constraints in effect on the dimensions, and the current state of the report. The report itself should be visible in a single button click.
If users can see something, like a column of a report, they should be able to edit it directly.
Edits performed on report columns should leave as much of the report still valid as possible. Requerying after an edit should at most fetch the data needed to rebuild the edited column.
The query must have an immediate STOP command, and the query tool must not preempt the user's machine while it is waiting for data from the DBMS.

Executive, Analyst and Developer Interfaces

The executive interfaceis concerned with alerting the executive to look at something unusual in a pre-canned report. It does not need to allow the construction or modification of an individual reporting screen.
The analyst interface is concerned withe running and modifying pre-canned, parameterized reports. The analyst user should be able to change among pre-canned constraints on dimensions etc..
The developer interface mainly concerned with building pre-canned, parameterized reports.

Query Tool Features

These are desirable features for a query tool:

Browsing
A browser allows the user to display distinct lists of values for each dimension attribute and then to set constraints progressively on one or more of the attributes.
Outer Join
A query tool fetching multiple answer sets and combining them must always combine these answer sets using outer joins. The outer joins are almost all 2-way (symmetric) outer joins.
Interface to Help
Pre-canned Comparisons
Distributed Calculations: Nonadditive Measures
- Nonadditive quantities derived from separate queries (e.g. ratios) can be usefully presented at different grains and in break rows only if the additive components are fetched from the DBMS and the computation is performed in the query tool as the last step before presentation.
Drilling Down- produce more row headers.
- Drilling down does not mean descending a predetermined hierarchy. It means being able to quickly ask for additional row headers from any of the dimensions joined to the fact table. It also means being able to remove row headers and drill down in a different direction.
Advanced exception handling
- Highlighting exceptional items
- Limiting the display to or throwing out exceptional items
- Marking as exception if in top./bottom N,N% of report. Agents/triggers that occur if an event occurs in the DW environment.
Context-Specific Drill-Down Targets
Drill down to a different report depending on the context, e.g. if a particular exceptional no. is selected for drill-down, an entirely different report is generated.
Interaction with Aggregates
The need for an aggregate navigator has been previously mentioned.
Drilling Across
Drilling Across involves combining 2 or more fact tables that share dimensions into a single report. This will produce 2 or more metric columns in the final report, depending upon the no. of fact tables used. The drill-across model works as long as the row headers chosen for a particular report have exactly the same meaning in both fact tables and their resp. dimensions.
- Drilling across 2 or more fact tables is well defined as long as the row headers chosen for the particular report have exactly the same meaning in all the fact and dimension tables involved. Constraints may be applied to nonshared dimensions in one or more fact tables, but it would be advisable to warn the user of this condition either at runtime or on the report itself.
Extending SQL's Aggregation Operations
This would be extended to include measures like PERIODAVG, PERIODCOUNT.
Break rows
Break rows amount to sectional sub-total/semi-additive rows in the report.
Behavioral Constraints.
This involves building a special table containing keys we wish to constrain on and constraining on them in the SQL that gets generated.
Pivoting.
Pivoting allows the row and column headers to be scrambled in arbitrary combinations e.g. rows become columns and vice versa.
Handing off the answer set.
The query tool hands off the answer set to adjacent tools.
Printing
Batch operation
This is useful for
1. Time-scheduled report generation
2. Event-triggered report generation
3. Stress Testing

Administrative Responsibilities

Tool and Data Training
Pre-joins and Query Tool Metadata
Information Dictionary
Browsing support
Public and Private Constraint Groups
Public and Private Behavioral Groups
Aggregate Navigation Metatables and Statistics
Extract Tool Metatables

XVII. The Future

Points for the future:

Software is the key
- Optimization of star join query execution
- Indexing of multi-million-row dimension tables for browsing and constraining
- Accessing and indexing of composite key of large fact tables.
- Completing SQL so it can process business questions
- Support of low-level data compression
- Support of parallel processing
- Dimensional database design tools.
- Dimensional database extract, admin and QA tools.
- End user query tools

Hosted by www.Geocities.ws