Thursday, February 16, 2006

ETL Methodology

I want to take a little time to discuss something that is extremely important in data warehousing: ETL methodology.

ETL methodology can best be defined as the way ETL is done in an environment.

Everything regarding ETL has a process. Here's some pieces:
- Table naming conventions
- Job flow (source, transform, load)
- Job parameters
- Job options (Resurrection, pause, kill)
- General ETL job layout (sourcing on top, transform in middle, loading on bottom)
- Use of Staging areas, temporary files, and temporary tables

I'm developing a complete methodology for SSIS (SQL Server Integration Systems). When complete my methodology will be able to be applied into any environment at any company with only minor modifications.

Once I get a couple of things done around the house I'll start working on my ETL Methodology document. If anyone would like to review it let me know; I could use a couple reviewers.

I'd also like to give a shoutout to Ken Bland. Ken is a very talented ETL architect and developer who has jointly developed a very popular methodology for the IBM WebSphere DataStage(TM) product. Over 90% of my EDW knowledge was a direct result of working under Ken's guidance. His website is listed below if you'd like to take a look at his booming consulting business. Best of luck Ken.

KBA Associates

No comments: