Bad CO asked me to shed some light on why I've been taking the site offline here and there, why we had a long painful series of service issues over the last month, and what we've done and are doing about it. The site's demands are continually growing. We still have all the text content and most of the pictures going back to 2001 and the rate of growth is always increasing, and we have steadily improved the software that runs the site, but there are associated increases in hardware power, cost and complexity. To keep up with this we had grown a hardware nightmare. Fine for big companies with IT departments, but not so good for us (an IT department of me and a part-time sub-contractor). It was taking up too much time and money. Overall down-time was reduced because of multiple servers, our backup system was really bulletproof (we still have everything back to 2001), our emails were getting through spam filters better, but faults were creeping and and finding them was increasingly difficult. As my pretty picture shows, in total the site was running on five servers (ie. separate big beasty computers) with a sixth spare. By comparison, in 2003 ARRSE was running on one. All of these needed updates applying and routine problems attending to. Add a separate backup service and at one point an outsourced email sending service and it was an expensive nightmare. A month or so ago we started getting the irritating database errors. I never did get to the bottom of that although it prompted a major spend on hardware. Then a few weeks ago we started getting massive server loads and the site kept dying. We couldn't work out why. Finally after ages of hunting in the wrong place we realised it was the chat system, Flashchat. It was creating lots of connections to our server then sitting there doing nothing. Eventually the server keeled over and died (highlighting along the way a config error which should have limited this effect). Flashchat has now been unsupported for a couple of years and the integration with VBulletin, ie. our site, didn't work without hacking by me. I don't know where the error was, but as evening users will have noticed, it has gone (a new system will be in shortly but I'm not sure which). So....... this all prompted a move that we perhaps should have done years ago. A move into 'the cloud'. What this means in effect is that we no longer know what physical hardware we are using and a lot of maintenance, upgrading and configuration complexity is done for us. We can also change the size and performance of machines easily and it is fast and simple to bring new servers online in case of increased demand or hardware failure. Backups are easier. At the weekend then ARRSE moved to Amazon's cloud infrastructure and has been flying along since. We are still in a test period. At the moment we are paying by the hour at a suitably extreme rate for a simplified and part-managed cloud version of my white board sketch. Once we're settled and know what we need from Amazon our rates will come down as we move on to contracts, and I am confident that our demands will be affordable on Amazon although more expensive than our old hosting bills. We've set the 10th of April as the decision point for whether to stay with Amazon, what contracts to take on or whether to move house again. In the mean time there will be the odd bit of downtime while we test and adjust. I don't expect that was much interest to many! But, heh, it got me away from staring at server usage statistics for a while and I did get to show my artistic talent.