/ Zope / Apsis / Pound Mailing List / Archive / 2008 / 2008-09 / bottleneck in pound

[ << ] [ >> ]

[ pound error messages / Albert ... ] [ socket closing/shutdown / Albert ... ]

bottleneck in pound
Albert <pound(at)alacra.com>
2008-09-09 17:14:50 [ FULL ]
We run our servers out of 2 different data centers.  Earlier in the 
week, we switched all of our traffic to one side, and found that all of 
our HTTP transactions take an incredibly long time to serve.  At a peak, 
we receive about 200concurrent requests across both locations.  So on 
average, each pound server handles about 100 connections.  However, when 
one side was hit with 200 concurrent requests, pound started to take a 
really long time to serve pages, with all of the additional content 
(images, css, javascript, etc.).  Going directly to the backend web 
servers was returning normal results, but pound was being bottle necked.

At a peak, we were seeing about 30-40 Mbs on our firewall (a PIX 515E, 
with 100Mb card).  Pound v2.4.2 runs on RedHat ES 4 (2.6.9-67.ELsmp), 
and sits in front of 4 web servers on the back end, located on the local 
network.  I've put some trace in pound code, and found that at the peak, 
pound could take up to 3 seconds in the "for" loop between lines 
624-700, where its looking at the headers it just received from the 
client. A request to a homepage, normally takes .5 seconds, now was 
taking 8-10 seconds.  I understand that our network might be saturated, 
but I don't understand why pound is being bottlenecked within code where 
no network operations are being performed.  This sounds like a 
thread-management problem in the kernel.  Has anybody heard of this type 
of problem?

Under normal conditions (with 100 concurrent requests), the same "for" 
loop takes under 0.100 milliseconds -- 30,000 times faster.  Is there 
anything I should check?

Below are the ldd output for pound:

libpcreposix.so.0 => /usr/lib/libpcreposix.so.0 (0x0039c000)
libssl.so.4 => /lib/libssl.so.4 (0x04e81000)
libcrypto.so.4 => /lib/libcrypto.so.4 (0x04d95000)
libresolv.so.2 => /lib/libresolv.so.2 (0x00580000)
libdl.so.2 => /lib/libdl.so.2 (0x00111000)
libm.so.6 => /lib/tls/libm.so.6 (0x003a2000)
libtcmalloc.so.0 => /usr/lib/libtcmalloc.so.0 (0x00921000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x004b8000)
libc.so.6 => /lib/tls/libc.so.6 (0x0026e000)
libpcre.so.0 => /lib/libpcre.so.0 (0x003c7000)
libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x0023b000)
libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x00664000)
libcom_err.so.2 => /lib/libcom_err.so.2 (0x00595000)
libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0x0059a000)
libz.so.1 => /usr/lib/libz.so.1 (0x004cc000)
/lib/ld-linux.so.2 (0x00254000)
libstacktrace.so.0 => /usr/local/lib/libstacktrace.so.0 (0x00eae000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00a58000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00115000)
Attachments:  
text.html text/html 3038 Bytes

Re: [Pound Mailing List] bottleneck in pound
Dave Steinberg <dave(at)redterror.net>
2008-09-09 18:10:07 [ FULL ]
<snip>
[...]

<snip>

I'd tend to agree with your gut feel.  Something outside of pound is 
changing the time it takes for pound to do the same job!  What does 
'top' report when you're at peak usage?  I'm mostly curious if your CPU 
is spending all its time processing interrupts instead of user code.

Any idea how many packets/second you're moving?  Also - what are the CPU 
/ NIC / bus specs on this machine?
[...]

Re: [Pound Mailing List] bottleneck in pound
Albert <pound(at)alacra.com>
2008-09-09 19:07:16 [ FULL ]
The box has 4GB of memory.  Pound is using about 100MB.  It has 2 Xeon 
3.GHz processors.  CPU utilization is between 3-10 percent for pound-- 
same as normal conditions, so CPU is not being used too much.  The 
machine doesn't do anything else, so the rest of CPU is idle.

The machine has 1Gbt card, but is set to 100 Mbt Full-Duplex.  I'm not 
sure how to check the packets/sec.

Anyway, I just looked at the archive, and found an email thread talking 
about this problem exactly "blocked requests with increased concurrency" 
on 4/29/07 from Khaled Hassounah.  I don't think I have any other choice 
but to upgrade Linux version.  We'll try RedHat Linux ES 5, which is 
running kernel 2.6.18, and hope it fixes the thread management problem.

Dave Steinberg wrote:[...][...][...]
Attachments:  
text.html text/html 2245 Bytes

MailBoxer