Saturday, September 24, 2011

DRT Week


Last week was disaster recovery test week. A company wide exercise that involved all the systems in the company. It was treated like a high severity event and all our partners like IBM were fully focused on the exercise. It was fitting that it was conducted during the week of the anniversary of September 11 and handled with extreme seriousness. I had some doubts that our system would survive a rapid recovery drill but somehow it did. I did not really care because we had our share of disasters during that week as well. The problems started on Monday but continued all the way to the end of the week. Somehow we were able to solve these production issues and still manage to conduct our disaster recovery testing successfully. What a week! I said to my colleagues that we were in a midst of a real production disaster during the testing week. We did not need a simulated exercise to prove that we could survive a disaster.


It was an intense time where we had to intervene and help the remote warehouses. It was the same old story. But this time we had an inkling of what could have happened. A patch was installed on the previous Thursday evening and the warehouse folks reported a change the next day. It all started that Friday and carried over the weekend where the Nevada team could not work by themselves. So the problems proceeded into the disaster week when the rapid recovery exercise was moving into full swing. A near catastrophe occurred on Wednesday when we were helping the two warehouses at the same time. Usually we had one warehouse working fine but this time the load came from both the warehouses. As was like a mad man – shifting attention from one warehouse to another to get the work done.  A few mistakes which I thought was serious but shrugged off by the business when I explained the fiasco. I had lost focused when it seemed that everyone was calling me or going to my cubicle for a chat or to follow-up one thing or the other.


Somehow I managed to survive that Wednesday. The next day we went through some test scenarios but there was some problem with the test computer. So we did not do much testing. I had missed my date going to the gym on Wednesday night as well as Thursday. Friday came as a shock as the database disk space was full. This caused problems in both the warehouses again. The problem was solved after diagnosing the problem, and chatting with the relevant support staff in India. Getting to the right person at the right time is the secret to effectiveness. The problem was resolved in less than an hour and got the first warehouse going. But there were just too many meetings that occurred that day and I had to participate remotely. Trying to engage one during the meeting in the morning was difficult because of the support calls and I had to drop from the call after the Nevada team called me.


I went to lunch with the person beside my cubicle and I had a generous helping of Chinese fried rice. Afterward we went to Wal-Mart to look at the fishing rods. We are planning a fishing trip before winter comes.




No comments: