Not only that, but when no one was looking, I would go stand by its rack and say denigrating things about its scalability, just to see if it would crack.
Generally, the thing managed to hold up pretty well, but then we encountered The Problem. The Problem is a slight little issue with some 3Ware IDE RAID controllers, which sometimes decide to lose all memory of their RAID arrays when confronted by a cold boot. Combine this quirk with a late-night server lock-up induced by too-heavy traffic, and you’ve got a recipe for a very tasty disaster. I don’t know exactly how or why the 3Ware could lose the contents of a RAID 1 array, but it managed to do so for us and, judging by what we read on Usenet, a number of other lucky folks.
After we first encountered The Problem, we decided to try to protect ourselves from future disk errors by rebuilding the box with a journaling file system (JFS), Red Hat Linux’s ext3. Adding journaling to our already overloaded drives’ duties was just too much. The system especially had trouble dealing with large-ish files. Ultimately, I resorted to scripting web server log rolls for every hour in order to keep the thing going. If we didn’t manage it right, a (warm, please) reboot would be in order soon.
Obviously, it was time for a second server, something much faster than the current box. Our experience with the old PIII server taught us a couple of vital lessons about bottlenecks in mySQL/Apache servers. First and foremost, you need lots and lots of RAM. Second, a fast, reliable disk array is a must. (A related insight: cheap IDE RAID controllers aren’t to be trusted.) And more CPU power doesn’t hurt, either.
My plan was to build a box beefy enough to act as a database/back-end server for a whole array of Apache boxes, so we could ramp up The Tech Report’s semi-covert plan for utter, crushing world domination by adding lightweight front-end web servers as needed. Our old dual PIII box would be the first of these front-end, non-database boxes.
So I set out to build a new system that would handle the strain of a brutal, unmitigated Slashdotting and keep asking for more. The requirements: more RAM than Dodge, more reliability than Honda, and a RAID array potent enough to kill a horse.
I’m not quite sure how a RAID array could kill a horse, but I know I wouldn’t want to see it.
Oh, and it had to fit into our budget, which is: about five dollars.
Securing the parts
I started out my task by poking around a little to see how we might get a deal or two on some components. AMD was kind enough to kick in a pair of Athlon MP 2200+ chips, which were AMD’s fastest multiprocessor chips at the time. We’ve reviewed Athlon MP processors a number of times, and we’ve always been impressed by the performance of the dual Athlon platform. The Athlon itself is a very good processor, of course, and the dual front-side busses and other sophisticated tricks in the 760MPX chipset make for an excellent server platform—definitely an upgrade from our dual 866MHz Pentium IIIs.
Tyan agreed to supply us with one of its killer server mobos, the new Thunder K7X Pro, if we would display the “Powered by Tyan” logo on the site’s front page. Tyan motherboards have actually powered TR for a long time now, so no problem there. The Thunder K7X Pro (ours is a model S2469UGN) is the latest in Tyan’s very successful line of Athlon MP boards. The K7X Pro doesn’t require a proprietary power connector—it will accept the same auxiliary ATX12V connector as any Pentium 4 mobo or an EPS12V connector like new Xeon mobos—unlike past boards in the Thunder K7 line. The S2469UGN comes loaded with dual Intel NICs (one of which is a Gigabit Ethernet port), a dual-channel Adaptec Ultra 320 SCSI controller, four angled DIMM slots for use in low-profile cases, a pair of 64-bit/66MHz PCI slots, and a pair of sockets for those Athlon MP processors. This Tyan is a true server motherboard, with special features like console redirection to serial ports for better access to remote servers. All in all, exactly the kind of board we needed.
Next, we got a killer deal from the folks at Corsair on three 1GB DIMMs of registered DDR memory. Corsair RAM is top-notch stuff. We use it for most of our testing here in Damage Labs, and it’s exactly the kind of memory we’d want to put into a critical server. Having 3GB of it would address one of the key problems with our old server, as well. I nearly got a fourth 1GB DIMM, but I decided against it, since the 760MPX chipset can really only use about 3.5GB of main memory, all told.
One interesting note: in order to cram 1GB of memory on one DIMM, Corsair has double-stacked the memory chips on the module. Check out the picture below to see what I mean.
The rest of the server’s components I bought online at the best prices I could find. Let’s take a look at the server’s final specs, and then I’ll discuss some of the server’s features in a little more detail.
Below are the key specifications of the finished web server. I’ve put in all into a nice table, so it’s easy to digest.
|Processor||2 x Athlon MP 2200+ 1.8GHz|
|Front-side bus||Dual 266MHz (Dual 133MHz DDR)|
|Motherboard||Tyan Thunder K7X Pro S2649UGN|
|North bridge||AMD 762|
|South bridge||AMD 768|
|Memory size||3GB (3 DIMMs)|
|Memory type||Corsair Registered PC2100 DDR SDRAM|
|Graphics||ATI Rage XL (integrated)|
|RAID controller||Intel Server RAID controller U-31 (SRCU31) w/32MB cache|
|Storage||5 x Maxtor Atlas 10K III Ultra320 10,000 rpm SCSI hard drives
(RAID 10 w/1 hot spare)
|OS||Red Hat Linux 7.3|
They’re not mentioned above, but I also installed a floppy drive and a CD-ROM drive, both with black front faces, to make OS installation/recovery easier. Don’t recall the brands. Doesn’t really matter.
Anyhow, it’s not a bad setup. If you’re like me, you’re thinking “personal workstation—just throw in an AGP card.”
The Chenbro 2U case we chose to house all of this hardware in, however, is much too loud to use in a personal workstation. Great cooling, though. The Chenbro originally arrived with a 300W power supply, and everything would run on that unit, but we replaced it with a 460W model, just to be safe. Here’s how it all looks together:
You can see that the Chenbro case has six 3.5″ hot-swap drive enclosures. Mounted directly behind them is the enclosure’s SCSI backplane, which supplies power and connectivity to the drives in the hot-swap bays. DIP switches on the backplane control SCSI IDs for the drives. We only had to run a single cable from the SCSI backplane to the RAID controller card.
Server case manufacturers are churning out more exotic cases than the 2U Chenbro unit we chose. It is possible to cram four hot-swappable drive bays and a dual Athlon sever into a 1U chassis, like this. Also, some enclosures offer better reliability in the form of redundant, hot-swappable power supplies. However, those things cost money, and we were approaching the limits of our “about five dollars” budget quickly. We settled on the Chenbro as the best combination of features and price.
RAIDing my wallet: the disk subsystem
The most expensive piece of the whole puzzle was the disk subsystem, which was the most troublesome part of our old server, and the one I was most determined to upgrade. I snagged an old version of Intel’s SRCU31 SCSI RAID controller on eBay, and upgraded it to the latest firmware revision, which essentially transformed the unit into a like-new SRC31A with an entirely different driver and software architecture. (It took some work to determine that this upgrade was, in fact, possible. It seemed impressive at the time, anyhow.) I also ordered a 128MB DIMM to use with the controller, to give it a little more cache than the 32MB DIMM that came with it. However, the thing didn’t seem to like higher density memory chips. Rather than push the issue, after seeing the effects of cache memory in our roundup of IDE RAID controllers, I decided we could live with the stock 32MB DIMM.
For drives, I picked Maxtor’s Atlas 10K III. Five of them. Like so:
I set up these 17GB drives in a RAID 10 array with one drive acting as a dedicated hot spare. The total capacity is 34GB, or less than it would be for a RAID 5 array with the same number of drives. However, RAID 10 keeps the controller’s i960 processor from having to handle the parity calculations necessary for RAID 5. Total disk capacity is really not a priority in this application. Performance is, and RAID 10 was the best choice for performance.
Oddly enough, I almost had to scrap the whole thing when the RAID controller card wouldn’t fit into the case with a SCSI cable attached. The controller is a full-height PCI card, and it was plugged into a PCI riser that allows cards to be inserted parallel to the motherboard (this so things fit into a 2U enclosure). SCSI cables plug into the SRCU31 at the very top of the card, and the cable’s connector, slightly but definitely, wouldn’t clear the side wall of the case. I was able to overcome this problem by switching to a SCSI cable with a SCSI connector just a millimeter shorter, which allowed me to cram the card into the case. Crisis averted.
At the end of the day, this hardware setup promised solid reliability, but delivered something just as important: excellent Linux support. Red Hat 7.3 installed without needing additional drivers, and Intel’s software suite offers full control over the RAID array from inside Linux. Have a look at the real-time statistics the software provides for each physical drive in the array:
Intel’s Storcon utility allows one to configure an array, add or remove drives, check on the status of a degraded array, direct repairs, and the like. On a critical server in a remote location, this kind of capability is priceless.
Or at least fairly expensive.
So now you’ve seen an overview of our new server. I wish I’d had time to do several things before putting this system into service, including taking some better pictures and running some common benchmarks on it, either in Linux or in Windows. However, the old server decided to barf on its RAID array once again just days before I was planning to ship the new box to the co-lo facility. We had to scramble to get the new server shipped out and in place as soon as possible, so I had to cut short the benchmarking and photo sessions.
The folks at our new hosting outfit, Defender Hosting, did a wonderful job helping us get our new system shipped, racked, and turned up. (Thanks again to Hooz at 2CPU for recommending them.) Within 24 hours of leaving Damage Labs, the new system was online and running, serving all of TR’s traffic. And barely breaking a sweat.
Of course, no server setup is perfect, and I’m sure we’ll explore the limits of this one in due time. We have yet to feel the joy of a Slashdotting with this box, so when that happens the first time, all bets are off. This thing could come crashing down like Michael Jackson’s career. However, we have alleviated the most severe bottlenecks we’ve run into the past, which is a start.
Now, the old server will come out of our old hosting outfit and land in Damage Labs for a brief retrofit, in which the IDE RAID controller will be extracted, and a full exorcism performed. Once the server is free of evil spirits, faster hard drives will be installed, probably along with more memory. With luck, our quest for utter, crushing world domination will be on track, and we’ll bring up the first front-end server to go along with our new back-end box.
A brief epilogue
You have just read an article about how I built our new web server. I wrote it because lots of folks told me they wanted to see such an article. No doubt if you have some semblance of experience with server systems, you have developed some very strong, deeply insightful, and intensely correct views about how servers ought to be built and run. You would like to share them with me now, so I can see how desperately and utterly wrong I am about nearly everything.
That’s nice. Please think before you type, though. I promise, I’m only a partial idiot.