Ace's Hardware tackles mainframes
Ace's Hardware has a neat little article
on mainframe systems that should give you a little bit of perspective. The history of mainframes is covered, but what's really interesting is a short discussion of just how reliable these massive closets have become:
There is such an extremely high level of redundancy and error checking in these systems that there are very few scenarios, short of a Vogon Constructor fleet flying through your datacenter, which can cause a system outage. Each CPU die contains two complete execution pipelines that execute each instruction simultaneously. If the results of the two pipelines are not identical, the CPU state is regressed, and the instruction retried. If the retry again fails, the original CPU state is saved, and a spare CPU is activated and loaded with the saved state data. This CPU now resumes the work that was being performed by the failed chip. Memory chips, memory busses, I/O channels, power supplies, etc. all are either redundant in design, or have corresponding spares which can be can be put into use dynamically.
Sometimes, you've just gotta have uptime.