Tuesday, July 29, 2008

Continued: How many packages do I need?

The other day we looked at a scenario that described the difference between one and many packages. I also put a ghoulish picture of a nasty looking package in my post to scare you away from doing a large package when organized, smaller packages are the better choice.

Let's look at the following case scenario. Great Plains Foods, Inc has hired us to create a database load process for them. Their requirements are that the data is loaded from a flat file every day by 10 AM and any failures need to generate an email to their production support control. We are to load 5 data files into 5 tables. The information below shows us filenames, expected daily row count, and the destination tables.

File Name Expected Count Target Table
Orders A 5500 DAILY_ORDERS
Orders B 1200 DAILY_ORDERS
Production 7500 DAILY_PRODUCTION

From observing the data, we see that the orders files A & B load into the same table. A question for the client should pop into your head at this point, which is "What is the difference between Orders A and Orders B files?" Our business associate at Great Plains tells us that the files are the same but A comes from computer orders and B comes from telephone orders. Now, from an architecture standpoint, how many packages would you create for this job?

I would create 4: Orders, Production, Shipments, and Employees. The Orders job will read from two files while the remainder will read from the single source file. I have seen single packages load all these tables in a single job, but that kind of architecture does not leave any room to manually load a single file or to handle data related issues and failures.

What would you do in this scenario?

1 comment:

The one they call the Rosa said...

I would create 4 packages and additionally flag the orders into two sales organizations (Telephone and Online). This way when your users want to drill down in a report, they can break up the numbers as how they were entered into the transitional system.