MyriMatch
Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He recently offered to provide us with an intriguing new benchmark based on an application he's developed for use in his research work. The application is called MyriMatch, and it's intended for use in proteomics, or the large-scale study of protein. I'll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest. David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching.

In this test, 1503 tandem mass spectra from a Thermo LCQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker's yeast).

The multithreaded stage of MyriMatch comes during generation of peptides from the database and comparison of those peptides with the experimental spectra. MyriMatch detects the number of CPUs/cores available on the system and spawns a worker thread for each. Worker threads then "take a number" out of a list of "worker numbers" and will iterate through the protein database in steps sized according to how big the "worker numbers" list is. The list is created so that each worker thread will finish its current number and then come back for another after it finishes. For example, on a machine with one dual-core processor, 2 threads will be spawned, and the "worker numbers" list might be any multiple of the number of worker threads, like: (1, 2, 3, 4, 5, 6, 7, 8). The first thread works on proteins 1, 9, 17, 25, etc. The second thread works on proteins 2, 10, 18, 26, etc. Whenever a thread finishes it will take the next number in the list, and iterate through the database again using the new number as the starting point. This technique is intended to minimize synchronization overhead between threads, minimize idle CPU time, and minimize the effect of some unfortunate ordering in the protein database causing one thread to search long proteins while another thread searches short proteins.

David and his colleagues will be publishing a paper on the MyriMatch algorithms, and I understand they hope to make MyriMatch available as open-source software, as well. The most important news for us is that MyriMatch is a real-world application, widely multithreaded, that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we've tested with one to four threads.

These results give us a new spin on the question of scaling. The Core 2 Extreme QX6700 is easily faster than the FX-74 with one and two threads, and it would appear to be on its way to outright victory. However, the QX6700's performance doesn't scale well when moving to three and four threads, while the FX-74's does. The QX6700 might be running into a bus or memory bandwidth limitation. Whatever the case, the Quad FX system turns in the quickest overall processing time with four threads, albeit by a narrow margin. The moral of the story? If you're matching peptides to spectra at home, but FX-74 will probably serve you best.

STARS Euler3d computational fluid dynamics
Our next benchmark is also a new one for us. Charles O'Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us recently to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here. (I believe the score you see there at almost 3Hz comes from our eight-core Clovertown test system.)

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.

The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. I understand the STARS Euler3D routines are both very floating-point intensive and oftentimes limited by memory bandwidth. Here's how our contenders handled it.

Well, the Core 2 processors pretty much embarrass the Athlon 64s here. Even the dual-core X6800 runs faster than the Quad FX.
Copyright ©1999-2009 The Tech Report. All rights reserved.
About us | Privacy policy | Subscribe to our mailing list