MyriMatch
Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He recently offered to provide us with an intriguing new benchmark based on an application he's developed for use in his research work. The application is called MyriMatch, and it's intended for use in proteomics, or the large-scale study of protein. I'll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest. David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching.

In this test, 1503 tandem mass spectra from a Thermo LCQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker's yeast).

The multithreaded stage of MyriMatch comes during generation of peptides from the database and comparison of those peptides with the experimental spectra. MyriMatch detects the number of CPUs/cores available on the system and spawns a worker thread for each. Worker threads then "take a number" out of a list of "worker numbers" and will iterate through the protein database in steps sized according to how big the "worker numbers" list is. The list is created so that each worker thread will finish its current number and then come back for another after it finishes. For example, on a machine with one dual-core processor, 2 threads will be spawned, and the "worker numbers" list might be any multiple of the number of worker threads, like: (1, 2, 3, 4, 5, 6, 7, 8). The first thread works on proteins 1, 9, 17, 25, etc. The second thread works on proteins 2, 10, 18, 26, etc. Whenever a thread finishes it will take the next number in the list, and iterate through the database again using the new number as the starting point. This technique is intended to minimize synchronization overhead between threads, minimize idle CPU time, and minimize the effect of some unfortunate ordering in the protein database causing one thread to search long proteins while another thread searches short proteins.

David and his colleagues will be publishing a paper on the MyriMatch algorithms, and I understand they hope to make MyriMatch available as open-source software, as well. The most important news for us is that MyriMatch is a real-world application, widely multithreaded, that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we've tested with one to eight threads.

Here's an interesting look at how these systems scale up with a multithreaded application. Performance rises in textbook fashion on the two quad-core systems as we add threads, with each performing best at four threads and then holding more or less steady beyond that. The dual Xeon 5355 system with eight cores, though, runs into some scaling issues. Completion times drop until we get to six threads, and then they turn progressively upward at seven and eight threads, with eight threads being as slow as three. The curve from three threads to eight is almost parabolic. The system and the application are locked in some kind of dysfunctional dance at higher thread counts, perhaps due to limited hardware resources (such as memory or bus bandwidth) or perhaps because of a more esoteric software issue. For what it's worth, we've seen this same application fail to scale well with four threads on the single-socket desktop version of the Xeon 5355, the Core 2 Extreme QX6700, though it does scale well on other four-way systems.

STARS Euler3d computational fluid dynamics
Our next benchmark is also a new one for us. Charles O'Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us recently to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here. (I believe the score you see there at almost 3Hz comes from our eight-core Clovertown test system.)

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.

The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. I understand the STARS Euler3D routines are both very floating-point intensive and oftentimes limited by memory bandwidth. Here's how our contenders handled it.

This is just total dominance from the Xeons. The eight-way Clovertown system more than doubles the execution speed of the dual Opteron 2218.
Copyright ©1999-2009 The Tech Report. All rights reserved.
About us | Privacy policy | Subscribe to our mailing list