Sunday, March 29, 2009

Orlando Code Camp Review

Yesterday I gave a presentation at the above mentioned event and it was the best one I have done so far. Had a great room, good audience participation, I felt awake and alive, and everything seemed to flow quite smoothly. I want to thank the Orlando organizers for hosting another great event. I look forward to my visits there, between the fall SQL Saturday and spring code camp, it's one of the highlights of my talks each year.

Thursday, March 19, 2009

Sprint Phone Comparison

I thought that today I could write a bit about cell phones. I prefer to use PDA phones since it's easiest to keep up with the business from these phones with respect to emails and such, and to disappoint a few, no I do not use an iPhone nor have any plans to get one.

I was using a Sprint 6700 until August of last year, when it failed me on a business trip. Dealing with refurbished models isn't something I enjoy, so I bought a Sprint Mogul brand new off Ebay. The Mogul was a good phone, I really liked it but it had issues with memory management. The phone would also shut off at random times, leading to missed calls and interrupted calls. I downloaded all the software updates but still ended up doing a hard reset on a daily basis. As a last ditch effort, I found a custom WM6.5 rom on the internet and loaded it on the phone. I must say that WM6.5 is a great software update and I can't wait for the official release, but it was too much for the Mogul to handle, and required twice daily hard resets to clear the memory.

Sprint offered to upgrade me to a Touch Pro, the newest line in PDA phones they offer. Because the price was right, I elected to try this. It meant that I had to turn in my Mogul though, as it was an exchange. My first thoughts of the Touch Pro is that the screen is beautiful, lots of resolution, but i miss the wide stance of the 6700 and Mogul. The Touch Pro is a little thinner, more the size of a traditional cell phone, but I have become used to the wider phones and that has taken some getting used to. The functionality is flawless, the phone works, email works, and it has plenty of memory which is much appreciated. The form of the phone seems fragile though, I can't help but think that one drop and it's toast.

For now, I'm pleased to have a phone that works, but I find myself missing the Mogul. I just found a Wm6.5 rom for it so I'll do my own software update and try to give this one a month to see if my satisfaction improves. If not, it will be back on Ebay and I'll move to something else.

Tuesday, March 17, 2009

The Future of ETL

Yesterday I wrote about the arduous task of installing IBM Websphere DataStage. I mentioned that DataStage has been the tool of choice for high-performance ETL, and I wanted to elaborate on this topic today.

In my talks that I do at Code camps, I always take a minute or two to talk about ETL tools. This is a very important part of data warehousing, perhaps the most important part (aside from the data). Too often, the ETL tool choice is made after hearing sales pitches but before really studying the differences in the tools.

There are three main tools on the market: Informatica, DataStage, and SSIS. I haven't directly worked from Informatica, but I'm told that it's similar to DataStage without the hashed files (Server edition). DataStage has some very, very high performance characteristics when using Enterprise edition with large data volumes on a partitioned tables/database design.

However, Microsoft SSIS is really coming up on the inside performance-wise. The 2005 version was very good and offered bulk loaders to load data, and the 2007 release of the native connection packages for different databases is a huge step in this direction. The 2008 version of SSIS is even better and has the cached lookup, which is a big leap for performance as well.

I've written before (in 2006) about some misses in SSIS such as the lack of an easy way to create surrogate keys. I know it can be done using scripting, but there isn't a transform function to do this (or use sequences/identity columns). However, these small misses shouldn't be enough to not consider the tool.

I believe that SSIS is the future of ETL as far as a platform because the price point and the features combine to create a powerful platform. As SSIS grows, other tools will become obsolete and SSIS will take a larger and larger portion of the ETL product market.

Monday, March 16, 2009

Installing IBM Websphere DataStage 8.0.1

Howdy folks, I thought today I would write about my installation of the IBM ETL toolsuite on my server.

IBM Websphere DataStage v8 is the newest version of the IBM ETL toolsuite. DataStage has for years been the leading high performance ETL tool in the market. DataStage was created by Ardent Software in the mid-1990s, Ardent was renamed to Ascential at some point, and Ascential Software was purchased by IBM for a tidy sum in 2004. In 2005, DataStage 7.5 was released as an Ascential product, and version 8 (v8) is the first true IBM release of the suite.

IBM has made a bunch of changes to v8 to integrate it more with their WebSphere web services product and to position DataStage as a software-as-a-service model more than the traditional batch product. For those of us just looking at DataStage as a nightly run process, this is a negative thing. For those trying to integrate DataStage into webpages, real-time data integration, and right-time data warehouse loading, it's a positive thing.

I installed DSv8 for a client on thier machine last year. I've installed DSv7 on Unix machines and Windows as well. I was able to trade a couple hours of database work for an extra server that one of my clients had sitting around, and I feel it was a great trade. The server is a HP pentium itanium 2 gz x 2 processors with 2 GB of ram. It's slow by today's standards, but it is a 2004 model. The worst part is that it doesn't have a DVD drive or USB2.0 which made putting DS on it a lot of fun.

First, I had to start with a fresh O/S wipe. I installed Windows Server 2003 Standard R2. This took about 45 minutes. I then took a couple hours to transfer the DataStage files over from my laptop using an external HD.

For those curious, I purchased a software package from IBM that includes a development license of DataStage v8. It wasn't cheap, and it's not easy to find, but it's there. I wish that IBM would make it easier and the license terms less onerous. I don't have an extra $350k sitting in my software budget.

I used my laptop to read the installation manual instead of printing it out (saving the environment), and it was pretty intense. There are a lot of settings, permissions, users, and environment variables to configure. I then ran the installation package for DS, which installs DB2 9.1 for use as the metadata repository. Once DB2 is installed, the actual DS installation commenced. It ran flawlessly to completion, and took about 75 minutes in all.

Once I had the software set up, I had to start the websphere server. However I still wasn't able to log into DS. Puzzled, I began to research. I discovered I needed to configure the user accounts in the Administrative console, and while it wasn't easy, it wasn't that hard either. There are a lot of screens and options in this area.

I was then able to log into the Designer and create my first ETL package on my home server. I'm still trying to detemine how I will handle my servers, so I'll write more about server management at some other time.

For now, happy coding.

Wednesday, March 04, 2009

New Website Functional

Hello all, I have a good announcement today

The relaunced website is up. The address is www.thedamndata.com. For now, this blog will still be hosted at http://thedamndata.blogspot.com.

The main addition to the website are two major components: Forums and News

The Forums are a discussion area that can be used to discuss anything from SQL databases to your best friends' dog fetish. Just kidding on the second part, but there should be a post for every want and need relating to data. If you feel we're missing something, please let me know and I'll make adjustments to them.

I'm searching for two moderators for the forums, please email me if you're interested.

The other exicitng part is News. The news feed is still being finished but news regarding data will be posted and continuously updated.

I hope you like and use the new website. Please refer your friends as well.

Thanks for the patience as the website was redone. I'm not a graphic designer and I'm sure there are a lot of changes in store for it in the future.