Tuesday, May 08, 2007

Upgrading IBM Websphere DataStage

It's official - I have converted my employers' data warehouse to use IBM Websphere DataStage 7.5.2; we were running DataStage 7.1 previously. We use DataStage (DS) as our ETL tool, processing over 20 million records daily in a multi-terabyte data warehouse.

I'm proud to say that I led the project, presenting status reports to the CIO and vice-presidents, communcating between the groups, and actually directing the install and doing a majority of the testing.

The upgrade process ran pretty smoothly, all said and done. There were some issues encountered so I'll go ahead and list them here in case someone else has the same issues and somehow finds this post.

First, we are running Sun Solaris 5.8 on our current production server. This is a 7 year old 22-CPU machine. Needless to say, it's pretty slow. Our test environment was our brand-new 8 dual-core CPU (at 2 GHZ) Sun Solaris 10 machine. Therefore, we have substantially greater performance on our development machine than we do on our production machine - for now. This kind of performance disparity makes it diffucult to test because DataStage has many tunable parameters and while I wanted to mimic the prod enviornment, it would almost be criminal to dumb down our development machine to that level of DS performance.

DS has a file (uvconfig) that contains the specific settings for the application. Many variables in this file (NFILES, MFILES, etc) need to be set up to allow a high level of performance, utilizing the machine and software to it's full potential. Verify that the NOFILES (number of concurrent open files) parameter in the operating system is set high (at least 2048). Then set NFILES and TFILES parameters at least 100 or 130 in the uvconfig file. Make sure the DSENV file sources correctly and populates the library variables correctly.

In the end, our DataStage upgrade went smoothly because of good coordination, testing, and documenting all problems encountered during testing helped us a great deal when performing the actual install.

Now I'm looking forward to moving to DataStage 8 (Hawk) next year - the 8 release looks to have some great toolsets and improvements that will go a long ways toward ensuring data quality and funcationality of ETL processes.

2 comments:

Sachin said...

Hi,

Do you have some important prerequisites that have to be done to do the upgrade from 7 to 7.5.2?

I can tell you that ver 8 is sure to make things worse with IBM attaching a lot of components tied together which, may I say, are not needed for most organisations. + their pre-requisites are high versions so you really have to buy them or use DB2 v9, which again most organisations won't need to use.

Pratim Chaudhuri said...

Hi Wes,

I am incharge of upgrading my production server from version 7.1 to 7.5.

Can you please provide me informations related to problems you faced etc. That will really help me in forecasting any accidents and thus keep contingency plan ready.

Thanks in advance. My email address is pratimdc@gmail.com

with regards,
Pratim Chaudhuri