/ Zope / Apsis / Pound Mailing List / Archive / 2006 / 2006-11 / Explaining pound SSL performance

[ << ] [ >> ]

[ Pound problem in redirecting URL / "Hennie ... ] [ poundctl feature suggestion: More status info / ... ]

Explaining pound SSL performance
Sven Ulland <sveniu(at)opera.com>
2006-11-10 17:20:26 [ SNIP ]
I'm setting up a pound machine (IBM blade, 4GB ram, dual 3.2GHz intel
Xeon) that will handle incoming SSL connections from clients, and send
them to four back-end servers using plain HTTP. Pound version 2.1.6 is
used, on linux 2.6.13 SMP.

When doing performance testing (using three other blades as clients,
running simultaneous tests with httperf), I get a maximum performance
of around 400 requests/sec over SSL. However, if I go above 400req/sec,
I start getting error messages in my /var/log/daemon.log:

[.. nothing in the log up until this point ..]
Nov 10 14:55:45 m11 pound: error copy server cont: Connection reset by peer
Nov 10 14:55:52 m11 last message repeated 8 times
Nov 10 14:55:52 m11 pound: error copy server cont: Broken pipe
Nov 10 14:55:53 m11 pound: error copy server cont: Connection reset by peer
Nov 10 14:55:53 m11 pound: error copy server cont: Connection reset by peer
Nov 10 14:55:53 m11 pound: backend 10.0.0.11:80 connect: Connection timed out
[.. lots more of exactly the same messages ..]
Nov 10 14:55:58 m11 pound: error copy server cont: Connection reset by peer
Nov 10 14:56:02 m11 last message repeated 19 times
[.. Note the following errors, about too many open files ..]
Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
Nov 10 14:56:02 m11 pound: backend 10.0.0.12:80 create: Too many open files
Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
Nov 10 14:56:02 m11 pound: error read from 10.0.0.21: Connection reset by peer
Nov 10 14:56:02 m11 pound: error read from 10.0.0.22: Connection reset by peer
Nov 10 14:56:02 m11 pound: error copy server cont: Connection reset by peer
Nov 10 14:56:02 m11 last message repeated 6 times
Nov 10 14:56:02 m11 pound: error copy server cont: Broken pipe
Nov 10 14:56:02 m11 pound: error copy server cont: Connection reset by peer
[.. lots more of exactly the same messages ..]


The string "Too many open files" does not come from pound itself, nor
the linux kernel. It comes from glibc 2, and is mapped to the EMFILE
error definition. This error is generated when a process' number
of open files reaches a limit described here:
http://www.gnu.org/software/libc/manual/html_node/Error-Codes.html#index-EMFILE-85
The limit can be increased on some systems (including linux), for
example by having 'ulimit -n xxxx' in the pound startup script. I
use 'ulimit -n 65535'.

Now, my theory about this problem is that when the client request rate
increases beyond a quite specific congestion point for the server, the
incoming connections come in faster than they can be served. This leads
to thrashing: The number of concurrent connections increases, as does
the number of open file descriptors (each socket requires one file
descriptor) and RAM usage shoots through the roof:

Output from top, note the 1.3GB and 500MB memory usage:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10033 root      21   0 1390m 479m 1544 R 99.9 14.0  11:53.53 pound

Because the resource use keeps increasing, it *will* reach a limit
sooner or later, and indicates that the server just isn't powerful
enough to handle the load. Possible solutions include load balancing
between pound proxies, adding better hardware, etc.

Would you agree with this conclusion? And do any of you have benchmarks
done with SSL? I'd be interested in hearing about your experiences.

Relevant threads from the mailing lists:
http://www.apsis.ch/pound/pound_list/archive/2006/2006-10/1161093315000
http://www.apsis.ch/pound/pound_list/archive/2006/2006-09/1158256873000
http://www.apsis.ch/pound/pound_list/archive/2006/2006-09/1157118439000



regards,
sven

Re: [Pound Mailing List] Explaining pound SSL performance
Robert Segall <roseg(at)apsis.ch>
2006-11-11 11:47:11 [ SNIP ]
On Fri, 2006-11-10 at 17:20 +0100, Sven Ulland wrote:
> I'm setting up a pound machine (IBM blade, 4GB ram, dual 3.2GHz intel
> Xeon) that will handle incoming SSL connections from clients, and send
> them to four back-end servers using plain HTTP. Pound version 2.1.6 is
> used, on linux 2.6.13 SMP.
> 
> When doing performance testing (using three other blades as clients,
> running simultaneous tests with httperf), I get a maximum performance
> of around 400 requests/sec over SSL. However, if I go above 400req/sec,
> I start getting error messages in my /var/log/daemon.log:
> 
> [.. nothing in the log up until this point ..]
> Nov 10 14:55:45 m11 pound: error copy server cont: Connection reset by peer
> Nov 10 14:55:52 m11 last message repeated 8 times
> Nov 10 14:55:52 m11 pound: error copy server cont: Broken pipe
> Nov 10 14:55:53 m11 pound: error copy server cont: Connection reset by peer
> Nov 10 14:55:53 m11 pound: error copy server cont: Connection reset by peer
> Nov 10 14:55:53 m11 pound: backend 10.0.0.11:80 connect: Connection timed out
> [.. lots more of exactly the same messages ..]
> Nov 10 14:55:58 m11 pound: error copy server cont: Connection reset by peer
> Nov 10 14:56:02 m11 last message repeated 19 times
> [.. Note the following errors, about too many open files ..]
> Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
> Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
> Nov 10 14:56:02 m11 pound: backend 10.0.0.12:80 create: Too many open files
> Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
> Nov 10 14:56:02 m11 pound: error read from 10.0.0.21: Connection reset by
peer
> Nov 10 14:56:02 m11 pound: error read from 10.0.0.22: Connection reset by
peer
> Nov 10 14:56:02 m11 pound: error copy server cont: Connection reset by peer
> Nov 10 14:56:02 m11 last message repeated 6 times
> Nov 10 14:56:02 m11 pound: error copy server cont: Broken pipe
> Nov 10 14:56:02 m11 pound: error copy server cont: Connection reset by peer
> [.. lots more of exactly the same messages ..]
> 
> 
> The string "Too many open files" does not come from pound itself, nor
> the linux kernel. It comes from glibc 2, and is mapped to the EMFILE
> error definition. This error is generated when a process' number
> of open files reaches a limit described here:
>
http://www.gnu.org/software/libc/manual/html_node/Error-Codes.html#index-EMFILE-85
> The limit can be increased on some systems (including linux), for
> example by having 'ulimit -n xxxx' in the pound startup script. I
> use 'ulimit -n 65535'.
> 
> Now, my theory about this problem is that when the client request rate
> increases beyond a quite specific congestion point for the server, the
> incoming connections come in faster than they can be served. This leads
> to thrashing: The number of concurrent connections increases, as does
> the number of open file descriptors (each socket requires one file
> descriptor) and RAM usage shoots through the roof:
> 
> Output from top, note the 1.3GB and 500MB memory usage:
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 10033 root      21   0 1390m 479m 1544 R 99.9 14.0  11:53.53 pound
> 
> Because the resource use keeps increasing, it *will* reach a limit
> sooner or later, and indicates that the server just isn't powerful
> enough to handle the load. Possible solutions include load balancing
> between pound proxies, adding better hardware, etc.
> 
> Would you agree with this conclusion? And do any of you have benchmarks
> done with SSL? I'd be interested in hearing about your experiences.

To some extent this is correct, but not quite:

1. The error message comes from Pound and not from glibc as you seem to
believe.

2. You may be seeing two separate issues here: back-end saturation
("connection timed out") and not enough file descriptors ("Too many open
files"). Make sure your back-ends can support the load, otherwise you
are measuring back-end rather than Pound performance.

3. The memory usage is due to the number of simultaneous threads - one
per connection. This may also be connected to slow back-end responses.
In your case however I would be more worried about the CPU being at
99.9%.

4. You should set httperf to wait longer for responses. In your case it
gives up too early ("connection reset by peer") and issues new requests
before Pound has finished answering previous ones.

Some people have reported over 2000 SSL reqs/second on multi-processor
machines with hardware acceleration (search for OpenSSL engine). Do you
really need that sort of performance? If yes then you can probably
afford it.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904


Re: [Pound Mailing List] Explaining pound SSL performance
"David Rees" <drees76(at)gmail.com>
2006-11-11 18:45:53 [ SNIP ]
On 11/11/06, Robert Segall <roseg(at)apsis.ch> wrote:
> 2. You may be seeing two separate issues here: back-end saturation
> ("connection timed out") and not enough file descriptors ("Too many open
> files"). Make sure your back-ends can support the load, otherwise you
> are measuring back-end rather than Pound performance.

Doesn't the fact that pound is at 100% CPU indicate that the
bottleneck in this case is in pound, not the backends?

> 3. The memory usage is due to the number of simultaneous threads - one
> per connection. This may also be connected to slow back-end responses.
> In your case however I would be more worried about the CPU being at
> 99.9%.

Perhaps a method to restrict the total number of connections to allow
pound to more gracefully fail under overload conditions by letting
clients sit on the Listen queue or reject connections if the Listen
queue fills up (like the ListenBacklog and MaxClients Apache/httpd
configuration directives).

-Dave

Re: [Pound Mailing List] Explaining pound SSL performance
Sven Ulland <sveniu(at)opera.com>
2006-11-12 17:56:46 [ SNIP ]
Robert Segall wrote:
> On Fri, 2006-11-10 at 17:20 +0100, Sven Ulland wrote:
>> I'm setting up a pound machine (IBM blade, 4GB ram, dual 3.2GHz intel
>> Xeon) that will handle incoming SSL connections from clients, and send
>> them to four back-end servers using plain HTTP. Pound version 2.1.6 is
>> used, on linux 2.6.13 SMP.
>>
>> When doing performance testing (using three other blades as clients,
>> running simultaneous tests with httperf), I get a maximum performance
>> of around 400 requests/sec over SSL. However, if I go above 400req/sec,
>> I start getting error messages in my /var/log/daemon.log:
>>
>> [.. nothing in the log up until this point ..]
>> Nov 10 14:55:45 m11 pound: error copy server cont: Connection reset by peer
>> Nov 10 14:55:52 m11 last message repeated 8 times
>> Nov 10 14:55:52 m11 pound: error copy server cont: Broken pipe
>> Nov 10 14:55:53 m11 pound: error copy server cont: Connection reset by peer
>> Nov 10 14:55:53 m11 pound: error copy server cont: Connection reset by peer
>> Nov 10 14:55:53 m11 pound: backend 10.0.0.11:80 connect: Connection timed
out
>> [.. lots more of exactly the same messages ..]
>> Nov 10 14:55:58 m11 pound: error copy server cont: Connection reset by peer
>> Nov 10 14:56:02 m11 last message repeated 19 times
>> [.. Note the following errors, about too many open files ..]
>> Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
>> Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
>> Nov 10 14:56:02 m11 pound: backend 10.0.0.12:80 create: Too many open files
>> Nov 10 14:56:02 m11 pound: HTTP accept: Too many open files
>> Nov 10 14:56:02 m11 pound: error read from 10.0.0.21: Connection reset by
peer
>> Nov 10 14:56:02 m11 pound: error read from 10.0.0.22: Connection reset by
peer
>> Nov 10 14:56:02 m11 pound: error copy server cont: Connection reset by peer
>> Nov 10 14:56:02 m11 last message repeated 6 times
>> Nov 10 14:56:02 m11 pound: error copy server cont: Broken pipe
>> Nov 10 14:56:02 m11 pound: error copy server cont: Connection reset by peer
>> [.. lots more of exactly the same messages ..]
>>
>>
>> The string "Too many open files" does not come from pound itself, nor
>> the linux kernel. It comes from glibc 2, and is mapped to the EMFILE
>> error definition. This error is generated when a process' number
>> of open files reaches a limit described here:
>>
http://www.gnu.org/software/libc/manual/html_node/Error-Codes.html#index-EMFILE-85
>> The limit can be increased on some systems (including linux), for
>> example by having 'ulimit -n xxxx' in the pound startup script. I
>> use 'ulimit -n 65535'.
>>
>> Now, my theory about this problem is that when the client request rate
>> increases beyond a quite specific congestion point for the server, the
>> incoming connections come in faster than they can be served. This leads
>> to thrashing: The number of concurrent connections increases, as does
>> the number of open file descriptors (each socket requires one file
>> descriptor) and RAM usage shoots through the roof:
>>
>> Output from top, note the 1.3GB and 500MB memory usage:
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 10033 root      21   0 1390m 479m 1544 R 99.9 14.0  11:53.53 pound
>>
>> Because the resource use keeps increasing, it *will* reach a limit
>> sooner or later, and indicates that the server just isn't powerful
>> enough to handle the load. Possible solutions include load balancing
>> between pound proxies, adding better hardware, etc.
>>
>> Would you agree with this conclusion? And do any of you have benchmarks
>> done with SSL? I'd be interested in hearing about your experiences.
> 
> To some extent this is correct, but not quite:
> 
> 1. The error message comes from Pound and not from glibc as you seem to
> believe.

The string "Too many open files" is not found in the pound sources,
nor in the kernel source. I found it in the glibc 2 library.. and
so I'm thinking that the message is spawned somewhere in the kernel,
then handled by glibc, which in turn passes it to the call within
pound, which then prints it out as part of the error message when
it fails to create a new listening socket (file descriptor). Isn't
that right?

> 2. You may be seeing two separate issues here: back-end saturation
> ("connection timed out") and not enough file descriptors ("Too many open
> files"). Make sure your back-ends can support the load, otherwise you
> are measuring back-end rather than Pound performance.

During separate testing of the back-ends, I'm seeing upwards of
8000 req/sec, so back-end saturation should not be the direct
problem here.

> 3. The memory usage is due to the number of simultaneous threads - one
> per connection. This may also be connected to slow back-end responses.
> In your case however I would be more worried about the CPU being at
> 99.9%.
> 
> 4. You should set httperf to wait longer for responses. In your case it
> gives up too early ("connection reset by peer") and issues new requests
> before Pound has finished answering previous ones.

I'll have a look at that. But I'm thinking that since requests
only take a few milliseconds to complete on an idle system, it
is an indicator of resource starvation when they take more than
a few tens (or hundreds) of ms to complete.

> Some people have reported over 2000 SSL reqs/second on multi-processor
> machines with hardware acceleration (search for OpenSSL engine). Do you
> really need that sort of performance? If yes then you can probably
> afford it.

That's very interesting. The expected usage will be around 300-400
req/sec avg, but you never know with estimations.. I'll have a
look at the possibilities with dedicated hardware crypto devices.
Since cpu use goes to the roof (at least during testing), a HW
crypto device might be what's needed if the real user load should
start pushing the limits of the hardware.

Thanks,

sven

MailBoxer