Its oldest benchmark still in use, and probably its best known, is the SPEC CPU2006, which, as its name implies, gauges CPUs and was published in 2006. ("Retired" versions of SPEC go back to 1992.)
The SPEC CPU2006 is actually a suite of applications that test integer and floating point performance in terms of both speed (the completion of single tasks) and throughput (the time needed to finish multiple tasks, also called "rate" by the benchmark). The resulting scores are the ratio of the time-to-completion for the tested machine compared to that of a reference machine. In this case the reference was a 1997 Sun Ultra Enterprise 2 with a 296MHz UltraSPARC II processor. It originally took the reference machine 12 days to complete the entire benchmark, according to SPEC literature.
At this writing the highest CPU2006 score (among more than 5,000 posted) was 31,400, for integer throughput on a 1,024-core Fujitsu SPARC M10-4S machine, tested in March 2014. In other words, it was 31,400 times faster than the reference machine. At the other extreme, a single-core Lenovo Thinkpad T43, tested in December 2007, scored 11.4.
Results are submitted to SPEC and reviewed by the organization before posting, explains Bob Cramblitt, SPEC communications director. "The results are very detailed so we can see if there are any anomalies. Occasionally results are rejected, mostly for failure to fill out the forms properly," he notes.
"Anyone can come up with a benchmark," says Steve Realmuto, SPEC's director. "Ours have credibility, as they were produced by a consortium of competing vendors, and all interests have been represented. There's full disclosure, the results must be submitted in enough detail to be reproducible and before being published they must be reviewed by us."
The major trend is toward more diversity in what is being measured, he notes. SPEC has been measuring power consumption vs. performance since 2008, more recently produced a server efficiency-rating tool, and is now working on benchmarks for cloud services, he adds.
"We don't see a lot of benchmarks for the desktop," Realmuto adds. "Traditional desktop workloads are single-threaded, while we focus on the server space. The challenge is creating benchmarks that take advantage of multiple cores, and we have succeeded."
FLOPS remains the main thing measured by the Linpack benchmark, which is the basis for the Top500 listing posted every six months since 1993. The list is managed by a trio of computer scientists: Jack Dongarra, director of the Innovative Computing Laboratory at the University of Tennessee; Erich Strohmaier, head of the Future Technologies Group at the Lawrence Berkeley National Laboratory; and Horst Simon, deputy director of Lawrence Berkeley National Laboratory.
Sign up for CIO Asia eNewsletters.