« December 2006 | Main | February 2007 »

January 29, 2007

Stranger in a strange land

This time of year our engineering group spends a lot of time in the classroom updating our technical expertise and learning new things. Every time one of our engineers achieves a new level of technical certification or expertise, we celebrate. What few outsiders realize is that the tech industry is run by a secret cabal of old timers (I am not eligible yet) who have a unique way of celebrating a new engineers first major technical certification.

About a week after achieving certification the new engineer receives in the mail a box of books. These are “The Great Books”. They are the greats of Science Fiction and Fantasy literature, mandatory for all geeks to read and absorb. Thus the new engineer is indoctrinated into the world of the geek. I like Sci-Fi books and movies, but I draw the line at Science Silliness.

I am referring to a series of commercials I have seen on televised golf matches over the years that make me laugh out loud. I guess they are funny, but it is the underlying technology message that I find so laughable. Messages on commercial TV are for the masses, and in the case of golf, an affluent mass (present company excluded). So, I realize that general statements will be made. A commercial for technology on TV will not be like an ad in a trade publication. Hyperbole rules.

The message from the series of TV commercials is that servers in a data center manufactured by Acme Computer Company (name changed to protect the silly) can be…are you ready for this…SELF HEALING.

Now, anytime a geek CAPITALIZES a WORD in an e-mail or BLOG, you KNOW IT IS SERIOUS BUSINESS. PAY ATTENTION! The idea of SELF HEALING servers is something that could only have been developed by Skynet. In which it case it will be destroyed by Sarah Connor’s unborn son. And yes, I also believe in causal loops, artificial intelligence and time travel.

The problem with SELF HEALING servers (aside from their non-existence) is the complexity of today’s data center. When I read the man pages for the SELF HEALING servers I get a very different view of the capabilities. Monitoring tools are pretty good now. We can push patches and new versions automatically. You can get automatic alerts when a component piece is ready to fail, but get real…SELF HEALING? I broke the cup holder off on a server at a customer site once. Believe me, it did not heal itself. I have never seen a new disk drive materialize out of thin air. That would be SELF HEALING and I would be VERY IMPRESSED.

Just for fun, if you have installed SELF HEALING servers in your data center and your boss asks you to configure your monitoring software so you can receive e-mails at 2:30 in the morning…tell your boss that will not be necessary. Cyborg’s have perfected the ability to HEAL your server without human intervention. Then get in your jeep, grab your tape recorder and head for Mexico. A storm is coming.

January 20, 2007

The Dreaded Wednesday Lunch

Every time we hire a new engineer, I warn them about the Dreaded Wednesday Lunch. Our engineers go onsite a couple of hundred times a year for short periods of time to various customer sites. We go to military bases, financial service firms, document shipping companies and many others.

On Monday morning our engineer arrives onsite. Our engineer and the customer’s IT staff stake out their respective territories. I have not observed anyone rubbing their whiskers against a server cabinet, cat-like, but you get the idea. Measurements are being taken. Assessments are made. Does this so called out-of-town expert know what he is doing? Is the customer environment really ready for this project? Will the customer throw the SOW out the window now? What does the latest plane schedule look like?

Once territories are staked out the real work begins. If there are going to be problems they will surface now. The data management software that we install and configure can send massive amounts of data through the network at high speed. Most of the customer’s day-to-day infrastructure traffic (messaging and application) flows at intervals with predictable peaks and valleys. Software that is designed to store data or backup and restore data can be intrusive. I have seen Network Admin’s used to predictable data flows freak out (we used to say that in the 70’s) when they observe the non-stop traffic patterns of the best data management software. We created separate networks like SAN’s to deal with these data demands. SAN technology is mature, but if not implemented properly it won’t meet expectations.

Because it is more fun to talk about our problems, let’s assume the project is experiencing “technical difficulties”. That is a euphemism for “there is something wrong with your network”. We are trained to deal with these problems. We can often figure them out, but it takes time. This is when a project can fall behind schedule. Now it is Wednesday late morning. Everyone is on a first name basis. You are Bill and Sandy; we are Jim (always Jim). Lunch is suggested. The latest trend in engineering is Sushi or the company cafeteria. At lunch, credentials are exchanged. Guards are let down. Comments are made.

Weeks later I get a phone call from the CIO. “You know, Bill and Sandy had lunch with Jim while he was on the project. They really hit it off”. I am dreading this already. “Jim told Bill and Sandy that we should have bought the XYZ option not the XYZ option we purchased. And he said that he had never actually configured the software in a multi-terabyte environment on Windows Server 2003 SP7 (released while we were onsite) with a Do-Hickey storage array tuned to warp speed and the current patch level. What gives? I thought you said your engineers knew what they were doing?”

“Did they by any chance have lunch on Wednesday?”, I ask. The lesson that I have learned from all of this is that the earlier we are involved in the design process, the better projects go. If we can do the full blown data management design, great. If we can at least review the architecture before the XYZ option is purchased, that helps.

So, over the years during the new-hire orientation with many engineers, I have developed possibly my single best piece of advice. Have lunch with the customer on Thursday.

January 16, 2007

The Generation Gap: People or Data?

Like the freakishly warm temperatures we have had in Washington, DC over the last two months and that ominous feeling it creates deep inside us, many government employees are at or nearing retirement and there is a bit of uneasiness about this phenomena too. Although not quite comparable to the impact of global warming (disclaimer: I am not a scientist), so many people leaving government at the same time will have a chilling effect.

Filling their places will be either contractors or new government employees. This creates a great opportunity: phase in policies now about where data resides and who controls it. Now’s a great time to make policies that secure data and match the way newer workers view their data and work product: transportable and transferable.

Not everyone wants to keep everything on their C drive anymore. What’s more, it is time we keep them from doing so with sensible policies published to all and then use rule based systems to do implementation and monitoring for us. Frankly, this will become more a matter of survival and relevancy than of, ‘who owns the data?’

Specifically, there are three actions that almost every IT department can take today:

At some agreed to interval, documents created and files stored across all application systems could be routinely scanned for data types and file types i.e., Adobe, PowerPoint, Excel, Exchange, Oracle, etc. This metalayer index can then be scanned for unusual blips one way or another. This is not Big Brother, this is good stewardship.
‘Seal and no steal’. This is my name for the process of making it such that C drives, thumb drives, DVDs and other portable media extensions are disabled on laptops and some desktops. Policy has to be worked out here but think about it, how often does something have to be transferred any more by physical media vice being sent through the network?

Encrypt what’s at rest. In-line encryption makes it possible for users to experience zero application delay in the work day while all of their work product is encrypted where it is stored, thereby protecting it from unauthorized access or theft by those on the outside and more importantly, those on the inside.

The leaves change, the seasons change and the global climate changes. Down here on earth, lifecycles and work cycles are changing in the federal government and that presents an opportunity and an obligation to secure our data, easily, through policies and technologies available today.

January 15, 2007

The Boss who was better than a sleeping pill.

Last week I presented the concept of a SAN to a customer. No, I am not going to tell you what those letters stand for. If you don’t know, you should probably go to tmz.com and read the latest misadventures of Britney and Paris. This is a serious technical blog. It assumes that the reader is intelligent and well informed.

As I was presenting the idea of a SAN to the customer I had a Dennis Hopper like flashback to the mid-nineties. Nobody knew what a SAN was in 1997. We were just discovering the virtues of a NAS environment (no, I won’t tell you what that stands for either). We were playing around with NDMP (stop asking) and figuring out how to manage massive amounts of data. In 1997 a massive amount of data was about 80 Gigs. The hard drive on my laptop today is 80 Gigs. We were so innocent.

I had a boss at that time who unfortunately understood SAN technology. I say unfortunately because he insisted on telling every customer, vendor and 7-Eleven clerk about the virtues of a SAN. One time I took him to meet with the CIO and President of a regional bank. They were planning to upgrade from Solaris to NT. Once I was done explaining that we would never call moving from Solaris to NT an “upgrade” my boss took over. He went to the chalkboard (kids, whiteboards had not been invented then) and proceeded to draw out a SAN diagram. He talked about it non-stop for an hour. The Bank President was furious. The CIO was puzzled. The project was eventually awarded to a competitor, one who probably answered their freekin’ questions!

Another time, I saw a customer fall asleep in the middle of my bosses SAN presentation. I was contemplating bodily harm when the boss resigned to go into the insurance business. I think he is an actuary now. Go figure. A that time I swore that I would uphold justice, the American Way of Life (AWL) and keep people awake during my presentations.

I am happy to report that no one fell asleep during my SAN presentation last week. Of course, it was via phone and I couldn’t actually see the people. And it did get kind of quiet at the end.

January 08, 2007

Tell them how much it will cost and maybe they will shut up.

I made the above statement to a Deputy CIO who runs the technology division for a significant part of the military industrial complex. OK...I actually said “maybe they will leave you alone” not “shut up”, but the title quote seems much more blog like.

His problem is email. The powers that be insist that their email be available 24x7. The Deputy CIO knows that their ancient infrastructure that includes 25 year old technology and stuff they bought yesterday, can’t support round the clock availability.

5-nines of availability is what the consulting industry would call the Deputy management’s request. That translates into about 30 seconds a year in downtime. That’s how long it took me to write the last two sentences (poor keyboard skills). Given the technical infrastructure the Deputy had to deal with, the two of us could hardly calculate the cost to implement 5-nines. I am looking over my notes from the call, there appear to be at least 10 zero’s in the last column.

How about 3-nines of availability? That is about 9 hours of downtime a year. I asked the Deputy if that was realistic. He told me that on average they are down about five hours a month and one time they had been down for 16 hours straight. We talked about a number of strategies to improve availability. I saw a few holes in their clustering setup. There was room for additional redundancy (is that redundant?). And some of the Sys Admins were in need of training.

We calculated the cost for the entire organization to upgrade email to 3-nines of availability. The estimate was $20 million. We both had a good laugh at that one. Finally, we settled on 2-nines of availability. That would give the IT staff about 3 ½ days of downtime to play around with a year. It wasn’t meeting the need for 24x7 availability, but the higher ups weren’t getting that anyway. The price tag was an immodest $7 million. This was steep, but within budget. The Deputy felt that they could do much better than 2-nines with these upgrades, but he liked the idea of setting realistic expectations.

He prepared a PowerPoint to logically lay out the case for 2-nines of availability. He knew the bosses would be disappointed that they were not going to deliver 5-nines. The Deputy felt his case was strong and logic unassailable. I caught up with him last week. I asked the Deputy how the presentation had gone. Did they get the piercing logic of his reasoning? Were they humbled by his slide showing a long term pattern of decreased IT investment as inboxes grew exponentially?

“You know what they said when I was done?”, the Deputy CIO asked. “They told me to shut up”.

January 04, 2007

Nature vs. Nurture and the big snowstorm of December 2006

On Wednesday, December 20, 2006 the foothills outside of Denver, Colorado received up to 48” of snow. Hundreds of thousands of people were unable to get in or out of the Mile High City. Vacations were cancelled, gifts were left unopened and there was an almost Revelations like weeping and gnashing of teeth.

Your intrepid blogger was among those stranded. I was scheduled to fly home from Washington DC to Denver on that fateful Wednesday (dramatic, huh). I finally made it home, broken, but unbowed five nights later on Christmas night. Alas, this blog is only partially about my travails. It is really about one of the oldest arguments …nature vs. nurture.

These hundreds of thousands of people (maybe zillions) were unable to log on to travel industry web sites or reach their airline via phone. The industry will try to write this off as Mother Nature at work. There was nothing they could do. Too much weather, too quick. Is that really the case, or did many in the travel industry fail to nurture the right technical environment?

Our business at DLT is to provide technical disaster recovery and business continuity services. We have helped hundreds of business and government customers figure out how to quickly recover their data and their technical services in the event of, well…48” of snow in one day. We have a set of best practices we follow. For example, if you are in an earthquake zone, don’t put your disaster recovery site down the street.

My spies have informed me that when the big snowstorm hit Denver, at least one airline was unable to get their telephone support personnel to their site…in Denver. There was no noticeable plan to quickly recover and provide critical customer support. Another travel industry company experienced major delays with their web site. For our customers, we test how many concurrent users can access a web site. Did this major travel industry company test for a worst case scenario or was the storm simply too much?

Now, we are talking about an industry that has barely made money since 9/11. I (sort of) feel their pain. Did the travel industry have a viable disaster recovery plan? Will this dislocation lead to the long anticipated industry consolidation? Am I just bitter that my own travel plans fell apart and using this forum to air private grievances? (no, yes, yes)

My head is spinning. That was too many questions. Next year I am going to Grandma’s house for Christmas. She lives down the street. Of course, she does have a septic system. I wonder if she has a good disaster recovery plan? Those things can back up and make a mess.