Wednesday, October 28, 2009

Designing Quality ETL Processes

ETL is the heart and soul of the data warehouse, so why does ETL job design and performance always seem to get little attention when schedules are tight? If it runs, it's fine, right? No! Follow the 'Durable Impact Top 5 Rules of ETL Design' below to develop quality ETL jobs and minimize your headaches with SQL Server Integration Services:
1. Always build jobs as modularized, stand alone components.
2. Segregate jobs based on the importance of the data. Most needed data gets loaded first, etc.
3. Build file extracts within a job and make every job restartable without having to restart the ETL process from the beginning. See #1 above.
4. Parallelize only when necessary. With SSIS 2008, don't use manual parallelization, the backend engine is much improved removing the headaches of slow performance on large data sets.
5. Use the tool best fit for the job. Example: When reading from databases, consider using the SQL ORDER BY command to presort data that must be sorted.
By following the 'Durable Impact Top 5 Rules for ETL Design' you can improve your performance and cut down on your support time.

No comments: