The other day we looked at a scenario that described the difference between one and many packages. I also put a ghoulish picture of a nasty looking package in my post to scare you away from doing a large package when organized, smaller packages are the better choice.
Let's look at the following case scenario. Great Plains Foods, Inc has hired us to create a database load process for them. Their requirements are that the data is loaded from a flat file every day by 10 AM and any failures need to generate an email to their production support control. We are to load 5 data files into 5 tables. The information below shows us filenames, expected daily row count, and the destination tables.
File Name Expected Count Target Table
Orders A 5500 DAILY_ORDERS
Orders B 1200 DAILY_ORDERS
Production 7500 DAILY_PRODUCTION
Employees 350 DAILY_EMPLOYEES
Shipments 900 DAILY_SHIPMENTS
From observing the data, we see that the orders files A & B load into the same table. A question for the client should pop into your head at this point, which is "What is the difference between Orders A and Orders B files?" Our business associate at Great Plains tells us that the files are the same but A comes from computer orders and B comes from telephone orders. Now, from an architecture standpoint, how many packages would you create for this job?
I would create 4: Orders, Production, Shipments, and Employees. The Orders job will read from two files while the remainder will read from the single source file. I have seen single packages load all these tables in a single job, but that kind of architecture does not leave any room to manually load a single file or to handle data related issues and failures.
What would you do in this scenario?