One of the clients I was working with asked me to take a look at their database performance. They felt that it was pretty poor and they were right. One of the things I noticed immediately was the over-analysis of tables. To explain, they were using Oracle and had an ETL job that analyzed the tables as soon as the load job finished. The particular instance had a small amount of daily updates but the statistics were going stale on a daily basis. When I started looking at the load job, I found that it did not differentiate between true and false updates. A true update is when a data component changes, and a false update goes down the load option without being changed from its' prior state.
Generally in good ETL architecture, the architect will put an escape clause to only load the true updates by doing a before and after comparison. This keeps the load process free from only changing the update timestamp column as was happening in this instance.
We had to change this job to only load the true updates in the target table, and in doing so we solved the stale statistics problem and also cut the load process by a factor of 10x. By changing this job we were able to disable the analyze table routine thus saving more processing time. The client was quite happy in this instance as the total time savings will be in the hours by the time this architecture change is propagated across all their subject areas.
Seconds don't sound like much until they begin to aggregate over a period of hours. In my next post, I'll talk about separating DBA maintenance tasks from ETL processing.
Until then, happy data warehousing.