Running
SPEC
Benchmarks on Linux
Chuck Lever, Netscape Communications Corp.
[email protected]
$Id: spec.html,v 1.7 1999/11/12 20:12:54 cel Exp $
Abstract
|
We describe the modifications required to run SPEC's SDET, KENBUS, and
SPECweb96 benchmarks on Linux.
|
This document is Copyright © 1999
Netscape Communications Corp.,
all rights reserved.
Trademarked material referenced in this document is copyright
by its respective owner.
|
Introduction
Linux is an open-source POSIX-compliant operating system that runs
on commodity Intel PC hardware.
The Linux kernel is developed using an innovative distributed process
that has resulted in a high-performance ultra-stable OS platform that
is often preferred over commercial OS distributions.
However, this process isn't perfect. Sometimes old problems are
re-introduced to the kernel, and performance can vary from release
to release. Very often, performance changes are described in informal
terms, and no scientific analysis is provided.
No history of performance improvements is maintained.
Clearly, a database of regression test results can help prevent
flagging performance or API compliance as development progresses.
In order to regression-test Linux scientifically, benchmark standards
must be agreed upon.
In this paper, we describe the modifications required to run SPEC's SDET,
KENBUS, and SPECweb96 benchmarks on Linux.
We show how to run the standard SPEC benchmarks on Linux,
and make a case for using these as part of a regression suite.
SPEC SDM 1.1
The latest release of SPEC's SDM benchmark is 1.1, current as of the
early 90's. Very little work has been done on this suite since then.
However, it is still a useful benchmark because of how thoroughly it
checks its output.
SPEC SDM is distributed by the
SPEC Organization for a fee.
In the modification instructions below, we assume that you already have some
familiarity the benchmarks.
Common changes
This section describes changes you'll need to make to Linux to
get both S-DET and KENBUS working correctly.
-
In order to run tests with a large number of processes, you will
need to re-compile the Linux kernel with NR_TASKS set to its
maximum value. The NR_TASKS constant is contained in
/usr/src/linux/include/linux/tasks.h.
Its maximum value is 4090 on Intel processors with the kernel
APM extensions enabled.
It is also recommended that MAX_TASKS_PER_USER be increased
to a number almost as large, say, 3500.
-
Be sure that any programs, such as xntpd, that can alter the system
clock have been disabled.
The system clock is used to measure the elapsed time for a benchmark
run, and if it changes during a run, the results can be altered,
sometimes in a way where the results look reasonable, and sometimes
they will appear outlandish (negative throughput, for example).
-
Disable any regularly scheduled jobs such as sendmail, or,
for example, a cron job that might start a news process.
These jobs will compete with the benchmark for I/O bandwidth and
physical memory, and cause unpredictable variation in the results.
-
Create /bin/time with the contents:
#!/bin/sh
exec /usr/bin/time --portability $*
and be sure the /bin comes before /usr/bin in the default
PATH used during the benchmarks.
-
The Linux "top" command may need to be rebuilt if you have
modified the NR_TASKS macro as described above, and would
like to use "top" to watch the system as you run the benchmark.
Otherwise, "top" will run out of process table space, and will
stop prematurely during benchmark runs with a large number of
scripts.
-
If you have installed an alternate C compiler (that is, a C compiler
that exists somewhere other than /usr/bin/gcc) you will need
to change the /usr/bin/cc link to point to it if you want to use
the alternate compiler during the benchmark.
-
If you want to run with many scripts, you will need to increase
the system-wide file descriptor maximum. You can do that by
echoing the new maximum into /proc/sys/fs/file-max:
su
echo 32768 >/proc/sys/fs/file-max
-
Be careful about how your benchmark file system is mounted.
If it is mounted with the "sync" option, the benchmark will
run much more slowly, and will be very disk-intensive.
You may also choose to set the "noatime" mount option to
lessen disk activity even further.
In specific, both benchmark suites use /usr/tmp which is
linked to /var/tmp.
If /var is mounted with the "sync" option, this will cause
significant slow-downs.
Description of S-DET
The S-DET portion of the benchmark is based on a script of typical
programs run by an imaginary software developer. The script contains
commands such as nroff, cc, and spell.
The script, and therefore the system load offered by the script, remains
the same over all invocations of the benchmark.
Offered load is varied by concurrently invoking several copies of this
script.
A throughput result is obtained by dividing the number of running scripts by
the elapsed time required for their completion.
This benchmark exercises multiprocessing, filesystem, and virtual memory
facilities. Even on modern hardware, this benchmark is able to create
significant loads. Because the output of every script is checked against
a standard output log, misbehavior caused by system overload can be
spotted by the benchmark automatically.
S-DET modifications
Linux distributions don't have standard "time" or "spell" programs, so
some minor adjustments must be made to compensate.
Step-by-step:
-
Create sdm1.1/benchspec/057.sdet/M.linux.22
- copy from M.sun
- delete extra LD flags
- change compiler optimization, MACHID, and LABEL
- salt to taste
-
Edit sdm1.1/benchspec/057.sdet/tools/excommon.h
- line 30 should read "} dummy;"
-
Edit /usr/bin/spell
- change the ispell invocation to "cat $* | ispell -l"
-
Edit sdm1.1/benchspec/057.sdet/output/generic
- go to the "starting text" section, replace it with:
*** starting text
real
user
sys
pre
Pre
SPECmark
SPECthruput
spiff
pre
pre
pre
POSIX
*** starting bprogs
At this time, there is a bug/feature in Gnu "make" that prevents the
"wrapper" feature of the runsdm script from working.
We are still investigating this problem.
Description of KENBUS
The KENBUS portion of the SDM benchmark suite is
similar to S-DET in that it is based on a fixed script of programs.
However, the KENBUS script is more typical of a time-shared
word processing environment.
The script driver simulates keystrokes at a rate controlled by
the benchmark user.
The KENBUS script is meant to be tailored by the benchmark user,
so the offered load can vary significantly.
Overall offered load is varied by concurrently invoking several copies
of the KENBUS script.
As with S-DET, a throughput result is obtained by dividing the number
of scripts by the elapsed time required for their completion.
This benchmark, like its cousin S-DET, can also tax a modern system
significantly, revealing operating system problems that don't arise under
everyday loads.
Because the output of every script is checked against
a standard output log, misbehavior caused by system overload can be
spotted by the benchmark automatically.
KENBUS modifications
The GNU versions of "make" and "time" need to have special options in order
to operate in a way the KENBUS scripts expect.
Step-by-step:
-
Create sdm1.1/benchspec/061.kenbus1/M.linux.22
- copy from M.sun
- delete extra LD flags
- change compiler optimization, MACHID, and LABEL
- salt to taste
-
Before invoking runsdm, set and export the following environment variable:
- MAKEFLAGS=--no-print-directory
-
Edit the master script in
sdm1.1/benchspec/061.kenbus1/Workload/script.master
- change /bin/sh to /bin/ash
- add "-s" to the piped invocations of "ed"
-
Edit sdm1.1/benchspec/061.kenbus1/check.sed
- replace PS1= assignment with static 'PS1="#"'
-
Edit benchspec/061.kenbus1/time.awk
- on line 64, change 'print' to 'print " "'
Due to the small system-wide maximum number of processes on Intel
Linux (4090), the KENBUS benchmark can't drive modern hardware
very hard.
Please stay tuned to this space for progress.
This document was written as part of the Linux Scalability Project.
For more information, see
our home page.
If you have comments or suggestions, email
[email protected]