Monday, June 27, 2011

Types of Data

This is a question I get asked quite a bit: "What are the different types of data that I can collect in my data warehouse?" Always an interesting topic, so I'll start by saying there are three basic types of data: empirical, anecdotal, and derived.

The basic case for data warehousing starts with empirical data. This is data that is collected - Eg. "I sold 10 widgets this week". Most data warehouses are built off this type of data, because it's really a "fact", meaning that it's true. Not to be confused with dimensional modeling, as an address is also "empirical" in nature.

A second type of data is derived. This is data that is created from another type of data. An example of derived data is "I sold 10 widgets this week for $1 each, therefore my total sales are $10 for the week". Derivation is the only way to perform a computation to get a picture. Think of aggregations as derived data.

The third type of data, and one that is less common, is anecdotal. This is data that is oberserved or believed but without any scientific basis. Anecdotal data often has applications in business. Think of the example that a salesman is selling widgets to a retailer, which we shall call Mega-lo-mart, and the salesman knows through discussion with the Mega-lo-mart manager that they don't intend to buy widgets this year, anecdotal data would be the salesman's oberservation that "mega-lo-mart doesn't indend to buy widgets this year because they aren't selling well". There is no scientific evidence this is true, but think of the business case, where a salesman is wasting time trying to sell to someone who will not buy the widget. Thus, there is a case that anecdotal evidence could be used in a data warehouse application, as long as it's documented as such, to help drive decisions.

I find these data types fascinating, especially the anecdotal nature. Sometimes it's difficult to determine which data type is particular type of data is, based on the way it was collected. That's our jobs as architects (typically a data modeler) who would analyze the data, work with business users to determine the applicability of the data, and build a dimensional model that contains all three types of data to present as a business intelligence applicaiton.

No comments: