Wednesday, December 17, 2008

Degree Complete, Now What?

Those who have followed me know that I've been completing a Master's degree in Economics for the past couple of years. I had to study at a slower pace due to job demands, and I'm pleased to announce that I received my diploma this past weekend, so your's truly for now is done with schoolwork.

A common question I receieve is "What are you doing to do with a Master's in Economics?" It's a good question as I primarily work with computers. I considered getting a M.S. in MIS, but thought this would be too redundant. I wanted to go the econ route because it's always interested me, I can shift in that direction with my career one day, and studying econ gives a very good understanding of the vagaries of the economy which affect every single one of us. Working with data, it's also given me a huge leg up with regards to efficiency and understanding business practices of the clients I work with. I was told that my degree had been shopped to a client considering a large data warehouse project because they wanted someone with both datawarehousing and analytical experience, and the degree became the differentiator that put me in a strong competitive position.

In short, it's like having a quiver full of arrows and I just added an additional arrow. Short term benefits are limited to the extra money I can make with the education, but long term benefits are unlimited. I'm glad I did it.

Tuesday, December 16, 2008

Extracting SQL in Data Warehouses

In working with potential clients, I often get the following question in tech interviews, and the question is "what is the best way to extract data from a data source?". To use the name of Andy Warren's blog, the answer is "It Depends".

I was asked this specific question today and gave the response above. Knowing that's an incomplete answer, I elaborated. The response was something like "It depends on the machines that are running both the database and the ETL process."

It goes something like this. Let's say our source system is an order entry system that is a mainframe DB2 system, and our data warehouse is a Windows-based SSIS implementation running 16 dual-core processors with 32 GB of RAM. We can run the extraction a couple of different ways, including:
- Perform a simple select statement and pull all rows out of the database on the source
- Perform a simple select statement and pull rows with a constraint
- Run a compplex query on the database engine to pull the minimum amount of data.

The key differentiator in this dilemna is how much data do we need? I've worked on ETL processes that pulled 20 million records in a single ETL jobstream only to use a majority for lookups and then updated only about 1 million records. I've also worked on ETL jobs that looked like the third case above, running a complex query with where clauses out the wazoo and returning just a couple rows to the ETL process. I generally try to avoid the first choice above unless we need all those rows. It's also possible and common to constrain the query and have indexes and other tweaks on the database side to speed query performance.

The way I would do it is dependent on the performance of the two machines. I always try to minimize long running network pulls, but if we run a significant query on the source system we may negatively impact their performance. Thus, it's such a localized question that is dependent on the environment.

Some good benchmark rules for extracting data are:
- Don't run queries on source systems that have negative performance impacts
- Don't have long running network connections to pull data
- Do utilize the machine with the most power for complex tasks
- Use all processor available to complete tasks faster
- Use specialized database connections (vs plain ODBC) to improve performance

Finally, always remember the correct answer is "It depends".

Monday, December 01, 2008

Computer Update

Hello again, as if you're not tired of hearing my long running discussion about work computers, here's another one and probably the last for a while.  

After my latest reformat of my Dell computer, it's been running like a champ.  Running so well that it seems to be brand new.  I'm very pleased with XP Pro and every day I'm reminded why I don't like Vista.  I was visiting one of my clients the other day and a mass comm email went out to the group saying that in 2009 there will be no upgrading to Vista, and this is a company that has thousands of PCs.  I have yet to work with any clients running it, and it seems that Microsoft acknowledges the issues with Vista and is pushing Windows 7 out for an early 2010 release.  From the specs I read a few weeks ago, Windows 7 is going to be a great system and use significantly less resources than Vista.

My biggest concern with laptops is the screen.  I prefer to use 17" lappies, and the screens vary from good to poor depending on the model.   When I bought the Dell in 2005, I didn't get the HD screen since it had a glossy finish, and there is not a single Windows-based consumer machine on the market today that doesn't have a glossy finish.    I purchased a cheaper HP laptop this year to see if the gloss would be as bad as I thought, and I'm sorry to say that it is.  The colors are very vivid, but that computer is worthless is a majority of the settings in which I need to do work, typically corporate offices or outside on a beach.   Dell offers a business 17" model (Precision line) which is very expensive and the same form factor as my current comp, which looks dated.    My searching led me to the Apple MacBook Pro 17, and with the current new line of MacBooks having went only to glossy screens, it was now or never for the 17, lest Apple only offer glossy on that one.

I'm typing this post from my new Apple MacBook Pro 17 with a HD screen (1900x1200) with matte finish.  I must say this screen is the most beautiful computer I have ever worked on.  It's powered by LED so it's bright and easy to read, and offers a lot of real-estate for doing remote desktops and development apps.  It has a 2.53 Ghz C2D processor, 4 GB ram, and 320 GB hard drive.  I installed VMWare Fusion 2.0 and run Windows XP Pro on it.  It took me less than 2 hours to have this computer set up exactly like I wanted, and I even have a W2K3 virtual machine with all my servers on it.  In short, I finally own my dream machine.

I'm sure I'll catch hell from the guys for using an Apple, but the screen sealed the deal.  I would only be happier with this purchase if it was free.  

For now, the laptop search is over and I'll try to get back with some consulting stories.  Remember it's all about the Damn data.