Sunday, February 19, 2006

DW Load Methods

A data warehouse is a warehouse that contains data. Sounds funny doesn't it? Imagine a warehouse as a storage structure that may be physical or logical. In the DW it's both...logical as in the data model represenatation and physical as to the hard disks and other computer equipment that support the data warehouse.

The first question should be "How does the data get there?" Excellent question and I'll cover some different load methods in my column today.

Here are three key areas to keep in mind....

Availability - will someone create a report with transactions completed 20 minutes ago or will it be with yesterday's data?

ETL Process - does our ETL process handle constant loads or is it a once a day process?

Hardware limitations - do we have the hardware resources to allow people to access data and create reports while we are inserting/updating data?

Types of ETL Processing:

1. Batch load - load all data once a day. At the company I work with we do a batch load process that begins at midnight and hopefully a majority is complete by 7 am. (pretty common)

2. Trickle feed - rows are loaded in the EDW as they are received in the collector (the hardest)

3. Microbatch - small batches of data are processed on either a time threshold (every 2 hours) or a row threshold (20,000 records in the collector).

There you have it.....a brief overview of common DW load methods. In a future column we'll further explore each of these methods.