/ Zope / Apsis / Pound Mailing List / Archive / 2004 / 2004-07 / Pound performance under RH8, RH9, Debian Sid

[ << ] [ >> ]

[ unsubscirbe / Christian Klinger ... ] [ Problem setting up reverse proxy / "Patrick ... ]

Pound performance under RH8, RH9, Debian Sid
Joel Dice <dicej(at)mailsnare.net>
2004-07-09 20:14:07 [ FULL ]
Hi all.

We've recently been doing some informal load testing with Pound (-current)
running on various machines, including Solaris 8, RedHat 8.0, RedHat
9.0, and Debian Sid boxes.  The tests basically involved simulating 40
concurrent users making aprox. 250 byte POSTS ranging from every 10
seconds to about 10 per second, with responses ranging from a 100 bytes to
about 100KB.  We ran these tests both with and without SSL.

We started by testing the Solaris and RedHat boxes (dual 450MHZ
UltraSPARC and dual 2.4GHZ Xeons, respectively) and were surprised to see
the Pound CPU usage run into the double digits, regularly spiking into the
90s when requests came in at high density.  This was the case even when
not using SSL.  In addition, interactive performance for real
(non-simulated) users was noticeably slower than when using an
Apache/mod_jk load balancer under the same conditions.

Then I tried running the same tests using my Debian Sid (kernel 2.6.5,
800MHZ Pentium III)  development system, and the performance was
dramatically improved.  CPU usage never broke 1% as far as I could see
(and rarely displayed higher than 0.0%), and interactive performance was
seamless - at least as good as the Apache/mod_jk solution.

Except in the case of the Solaris box (which is off-site), the backend
servers are the same in all cases, and the configurations are likewise
identical.  I used the same version of Pound in each case and did not
modify the compilation flags.

Since the performance problems on the non-Debian machines seem to be
SSL-independent, I'm lead to believe that the libc's on those machines
may be the culprits - particularly their regular expression support.
Still, it's odd that three different OSes would have this behavior.  Any
ideas?

 - Joel

Re: Pound performance under RH8, RH9, Debian Sid
Robert Segall <roseg(at)apsis.ch>
2004-07-12 14:10:49 [ FULL ]
On Friday 09 July 2004 20.14, Joel Dice wrote:[...]

I rather doubt it is an issue with pattern matching - especially as the code 
is identical for RH and Debian (it's the same gcc library). It's much more 
likely that this has to do with the threading model: on Debian we have the 
"old" Linux model, which uses clone() and thus runs a process per thread. On 
Solaris it uses LWP - thus all threads are accounted as a single process (and 
RH uses the NPTL).

The result is that on Solaris (like on *BSD, though for different reasons) all 
the threads are added for accounting purposes together: if you have 45 
threads, each using 1% CPU, you'll see Pound using almost half the CPU.

As a simple test, may I suggest you try running your tests again on RH with 
the "old" threading library (with the LD_ASSUME_KERNEL=2.4.19 env) and look 
at the results again.[...]

Re: Pound performance under RH8, RH9, Debian Sid
Joel Dice <dicej(at)mailsnare.net>
2004-07-12 15:42:23 [ FULL ]
Thanks for your comments, Robert.

I agree that this probably has something to do with threading, though it
may not be as simple as NPTL vs. LinuxThreads vs. Solaris LWPs.  First,
please note that the Debian box is running Debian Sid (unstable) with a
recent libc and a 2.6 kernel, which means it *is* using NPTL.  Second,
while RH9 does include NPTL backported to the 2.4 kernel, RH8 does not -
and they both exhibited poor performance.

By the way, one thing I have learned from this is not to trust 'top' - it
doesn't seem to properly calculate/display the overall CPU usage of
multithreaded apps (java, pound, etc.) correctly.  From the procps FAQ:

	Why is %CPU underreported for multi-threaded (Java, etc.) apps?

	Currently, the kernel does not provide a reasonable way to get this
	information.

On the other hand:

	Why do ps and top show threads individually?

	The 2.4.xx kernel does not provide proper support for grouping
	threads by process. Hacks exist to group them anyway, but such
	hacks will falsely group similar tasks and will fail to group
	tasks due to race conditions. The hacks are also slow. As none of
	this is acceptable in a critical system tool, task grouping is not
	currently available for the 2.4.xx kernel. The 2.6.xx kernel
	allows for proper thread grouping and reporting.  To take
	advantage of this, your programs must use a threading library that
	features the CLONE_THREAD flag. The NPTL pthreads provided by
	recent glibc releases use CLONE_THREAD.

Since I'm using very recent versions of procps, glibc, and the 2.6 kernel,
it's not clear why this isn't working.  I guess it's time to post to the
procps mailing list.

I'll try to do some more testing soon (using vmstat to observe the true
loads), including your LD_ASSUME_KERNEL suggestion on RH9.  I'll post to
the list if I have any new insights.

Thanks again.

 - Joel


On Mon, 12 Jul 2004, Robert Segall wrote:
[...]

MailBoxer