Friday, September 30, 2011

Data Models and Their Importance

Just this week I had a discussion with a client who is looking to hire a data modeler. We spoke about his project, the number of facts and dimensions, and what he was looking for in a data modeler. I felt compelled to add the following analogy:

"A data model is like the foundation of a house. Without a strong foundation, no amount of work or rework on the remainder will be correct, and it will frustrate you to failure, so get the foundation right the first time."

Unfortunately I know this all too well. I have been part of a few projects on the ETL side that had poor data models, either models that weren't completely thought through or models that weren't designed for the end solution. A data modeler needs to work very closely with the business/functional analysts to understand the data that will be stored in the model, and develop an enterprise model that can be easily maintained and expanded.

First, let's look at a typical data model. For this example, I'll use the star methodology. This is a simple fact table that contains surrogate keys that point to attributes in the dimensions. If you need to, review the difference between facts and dimensions here.

Our simple model is a grocery store. It has a date, customer, employee, and product dimension. It has a Sales fact table that lists each transaction.

This is a simple model but it carries a lot of information. The data modeler should be thinking ahead as to which dimensions should be added in the future for a sales fact table. There should also be thought as to which fact measures could be added in the future.

I've worked with clients who would design the exact model above and attempt to capture every possible fact and dimensional attribute the first release. DON'T try this! Just capture the majority of the dimensions which are needed to create a usable report, design the metrics (facts), and implement. Dimensions and facts can be easily added later if the tables are designed right, and this allows the IT department to deliver something quickly and get the business vested in the success of the project.

The synopsis, if you have made it this far, is to focus on building a solid data model from the start, containing relevant dimensions and needed facts to get the dimensional model up and running. Never, never, and I'll repeat never shortchange on your data model to get something out the door quicker.

No comments: