The Tech Report’s XML benchmark

Server benchmarking is an odd domain.  There are lots of very well developed, non-proprietary benchmarks available, but there are also teams of engineers with huge resources at companies like Intel, AMD, HP and Sun working to make their products look as good as possible.  Stuck in the middle are system engineers and IT managers who don’t really have any good sources for objective and practically usable performance metrics to base design and purchasing decisions on.

We try to cut through that nonsense wherever possible and provide transparent, useful comparisons of new hardware, but that’s especially difficult for server products.  Real-world server workloads are complex, and synthetic benchmarks are often not indicative of practical performance.  Server CPUs are especially hard.  Modern processors are so fast that it takes 100’s of concurrent connections to find the performance ceiling of the CPU, and you’re likely to run into network, memory or other I/O bottlenecks before you do.  Since one thing TR doesn’t have is a 200-client test lab, we have to make do with more synthetic means of comparison.

One aspect of server workloads, especially prevalent in web services, is dealing with XML documents.  XML has become the data lingua franca spoken between programs running across different hardware platforms, operating systems and programming languages. Parsing, generating and transforming XML text, by themselves, are CPU tasks that rely only on communicating with main memory, so they’re good candidates for creating a synthetic benchmark.

I had originally hoped to find a public XML benchmark that Scott could add to server reviews as-is, but my search didn’t turn up anything usable.  The open source XML Benchmark came closest, but wasn’t anything we could readily reduce to a meaningful comparison.  Resigned to writing one mostly from scratch, I decided to fill another gap in our lineup, the lack of any benchmark testing performance of code running inside Microsoft’s .NET framework.  Microsoft has been making huge strides with ASP.NET, and winning many converts from Java J2EE based development (if you struggled through that wikipedia link, maybe you can start to understand why).

I took the four of the basic units of work in XML benchmark and ported them to C#.  After some back-and-forth with Scott we arrived at a framework that allows running a variable number of iterations of each work unit (or a mix of each) across a variable number of threads.  The program reports how long it took for all threads to finish, as well as more detailed statistics about the CPU time spent across all threads, and the average runtime of the work units aggregated and broken out by type.  We only reported the total start-to-finish time in the Shanghai review because we couldn’t figure out a good way to describe the more granular results.

Because we want our results to be independently reproducible and verifiable, I’m publishing the source code for the XML benchmark program.  We’re planning on refining it over time, as well, so please discuss possible improvements in the comments, or email me directly with feedback or patches. I developed it in Visual Studio 2008, but it should also compile in the excellent, and free, Visual C# 2008 Express Edition.  The program relies on some test XML files; to maintain comparable results I’ve included the ones we used for the article with the source code.

I haven’t included a license statement, because I’m not a lawyer and I honestly have no idea what license this should fall under.  If you want to redistribute it, let me know and we can almost certainly put it under the GPL in an official way.

xmlbench_v1.zip

 

Comments closed
    • eitje
    • 11 years ago

    I recommend you move the benchrunner class into a separate library project, and just use the xmlbench project to provide the UI.

      • eitje
      • 11 years ago

      *I* like your changes. ๐Ÿ™‚

      • sroylance
      • 11 years ago

      Thanks for looking at it. Benchmarking is better served by the performance counters, the system tick count doesn’t have granularity better than 1 millisecond (and isn’t reliable below 10ms or so, according to some sources), but the runtime of individual work units is sometimes only a fraction of a millisecond.

      I’ll work on integrating your other improvements for our next iteration.

        • eitje
        • 11 years ago

        I think his goal of moving to the system tick was to get mono support.

        • niofis
        • 11 years ago

        As eitje said, I was trying to improve mono compatibility, otherwise you would have to run the test having wine installed too. I think *nix compatibility is important since many servers run non-microsoft OSs. Although novell’s implementation of the .net framework is a bit slower than microsoft’s in some cases.(Maybe you should try porting it to other languaje?)

        You’re right on the resolution of the TickCount vs the Performance Counter, since the former has a 10-16ms minimal interval (or so I remember), but I don’t find it that big of a problem if you’re timming extensive processes or a batch of smaller ones.

        On the other hand, there are many improvements you could make, like having your individual test be more independent, and building a more general mechanism to call them and for they to report results, that way you’ll be able to add and/or remove other tests easily.

        Oh, and a “Save Test Results” option would be nice, maybe on csv format, that way you’ll have an easier time importing them to Excel, where I think you generate the performance graphs for your articles. (I would have programmed myself but I just got the idea).

          • Flying Fox
          • 11 years ago

          I would say we make it use the hi-res timers on Windows and System Tick in others just for accuracy’s sake. We have to remember that this tool should serve TR’s purposes first.

    • Usacomp2k3
    • 11 years ago

    Cool. Thanks for sharing.

    • Flying Fox
    • 11 years ago

    If you don’t put a licence immediately people may start redistributing it. I would suggest, for now, offer it on a per request “private” basis.

    As for which licence to choose, looks like GPL is a good choice, main reason being that XML Benchmark is already GPL’ed, not because GPL is popular. If you want some commercial reuse and redistribution without requiring them to contribute back, then you can pick a more “liberal” Apache, BSD or MIT licence.

    A more complicated approach that I have seen is to LGPL the library/core code and GPL the GUI, so people can write their own GUI/skins and you leave people who are really interested in the core code to make contributions. For such a small program I don’t think so.

    BTW, what sort of granularity issues are you looking at?

    PS. At least fix the title caption of your main window before you release the source! ๐Ÿ˜›

      • eitje
      • 11 years ago

      agreed. If you sourced your work from a GPL base, then it makes sense to continue to distribute it using GPL.

Pin It on Pinterest

Share This