/ Zope / Apsis / Pound Mailing List / Archive / 2004 / 2004-07 / high load question

[ << ] [ >> ]

[ Re: Re[2]: privacy / "Claus ... ] [ Port number in HTTP_HOST / Michael DeGusta ... ]

high load question
Thierry Coopman <thierry(at)keytradebank.com>
2004-07-06 15:22:24 [ SNIP ]
Hi,

I have setup pound on 2 gentoo servers with LVS as load-balancing and HA 
solution. This setup works great under low load conditions.

I have LVS balance HTTPS connections over 2 Gentoo machines running 
Pound, that in turn connect to 8 back end servers in HTTP.

I have put it in production yesterday, only to reverse that today after 
discovering that some people were unable to log into our site.

We have a login page that needs to send its parameters with POST. From 
what I have seen pound forwards these requests, but puts the 
Content-Length header to 0, and the form data is effectivly not sent to 
the back end.

I have little or no control over the client (mostly Internet Explorer, 
different versions, so it's not one specific IE), The data send in the 
form is trival, a login and a password without special chars.

Now this behaviour is not there under low load conditions (and even on 
high load it's not always the case).

I was wondering if this could be a side effect of higher load. I had up 
to 60 requests per second (30 per machine), with some 150 (75) 
connections to the back end servers.

Did I bump onto a resource limit? I have no specific error messages 
apart from the usual broken pipes and connection resets caused by stupid 
IE or proxies. any hints on how I can upp the limits for Pound in specific?

forwarding the HTTPS traffic to apache with mod-ssl works fine for all 
customers.

Is there a way where I can debug this more, better, like dump all post 
data from the client somewhere or so?

Thanks fo input.

I'm running Pound 1.7.

pound .cfg file:

User            nobody
Group           nobody
#RootJail       /chroot/pound

ExtendedHTTP    0

WebDAV          0

LogLevel        4

Alive           10

HTTPSHeaders 0 "HTTPS: on"

ListenHTTPS *,443 /etc/secure.keytradebank.com.pem 
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL

UrlGroup ".*"
BackEnd 10.x.x.x,80,1
BackEnd 10.x.x.x,80,1
BackEnd 10.x.x.x,80,1
BackEnd 10.x.x.x,80,1

EndGroup

-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: high load question
Robert Segall <roseg(at)apsis.ch>
2004-07-06 17:35:04 [ SNIP ]
On Tuesday 06 July 2004 15.22, Thierry Coopman wrote:
> Hi,
>
> I have setup pound on 2 gentoo servers with LVS as load-balancing and HA
> solution. This setup works great under low load conditions.
>
> I have LVS balance HTTPS connections over 2 Gentoo machines running
> Pound, that in turn connect to 8 back end servers in HTTP.
>
> I have put it in production yesterday, only to reverse that today after
> discovering that some people were unable to log into our site.
>
> We have a login page that needs to send its parameters with POST. From
> what I have seen pound forwards these requests, but puts the
> Content-Length header to 0, and the form data is effectivly not sent to
> the back end.
>
> I have little or no control over the client (mostly Internet Explorer,
> different versions, so it's not one specific IE), The data send in the
> form is trival, a login and a password without special chars.
>
> Now this behaviour is not there under low load conditions (and even on
> high load it's not always the case).
>
> I was wondering if this could be a side effect of higher load. I had up
> to 60 requests per second (30 per machine), with some 150 (75)
> connections to the back end servers.
>
> Did I bump onto a resource limit? I have no specific error messages
> apart from the usual broken pipes and connection resets caused by stupid
> IE or proxies. any hints on how I can upp the limits for Pound in specific?
>
> forwarding the HTTPS traffic to apache with mod-ssl works fine for all
> customers.
>
> Is there a way where I can debug this more, better, like dump all post
> data from the client somewhere or so?
>
> Thanks fo input.
>
> I'm running Pound 1.7.
>
> pound .cfg file:
>
> User            nobody
> Group           nobody
> #RootJail       /chroot/pound
>
> ExtendedHTTP    0
>
> WebDAV          0
>
> LogLevel        4
>
> Alive           10
>
> HTTPSHeaders 0 "HTTPS: on"
>
> ListenHTTPS *,443 /etc/secure.keytradebank.com.pem
> ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
>
> UrlGroup ".*"
> BackEnd 10.x.x.x,80,1
> BackEnd 10.x.x.x,80,1
> BackEnd 10.x.x.x,80,1
> BackEnd 10.x.x.x,80,1
>
> EndGroup

I strongly suggest you move to -current, as 1.7 has some known issues and 
limitations.

I have never yet heard of problems with POST requests - I would be very 
interested in more details.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: high load question
Thierry Coopman <thierry(at)keytradebank.com>
2004-07-06 17:49:00 [ SNIP ]
it is all that one bug in IE.

keepalive is set to 60 seconds, on the servers it's usually less, so IE 
has to reconnect if the connection was dropped on the server and, when 
it reconnects, it kind of 'forgets' the POST data.

very stupid, I have all of my customers running HTTP/1.0 because of 
that, and I would prefer HTTP/1.1

It would be great to have a similar mechanism as Apache for this sort 
of  requests, where you can specify to dongrade to 1.0 if the Agent is 
MSIE ...

Either way, it's 'resolved' in a way that it works now and we switched 
back to the proxy :)

Thanks


Robert Segall wrote:

>On Tuesday 06 July 2004 15.22, Thierry Coopman wrote:
>  
>
>>Hi,
>>
>>I have setup pound on 2 gentoo servers with LVS as load-balancing and HA
>>solution. This setup works great under low load conditions.
>>
>>I have LVS balance HTTPS connections over 2 Gentoo machines running
>>Pound, that in turn connect to 8 back end servers in HTTP.
>>
>>I have put it in production yesterday, only to reverse that today after
>>discovering that some people were unable to log into our site.
>>
>>We have a login page that needs to send its parameters with POST. From
>>what I have seen pound forwards these requests, but puts the
>>Content-Length header to 0, and the form data is effectivly not sent to
>>the back end.
>>
>>I have little or no control over the client (mostly Internet Explorer,
>>different versions, so it's not one specific IE), The data send in the
>>form is trival, a login and a password without special chars.
>>
>>Now this behaviour is not there under low load conditions (and even on
>>high load it's not always the case).
>>
>>I was wondering if this could be a side effect of higher load. I had up
>>to 60 requests per second (30 per machine), with some 150 (75)
>>connections to the back end servers.
>>
>>Did I bump onto a resource limit? I have no specific error messages
>>apart from the usual broken pipes and connection resets caused by stupid
>>IE or proxies. any hints on how I can upp the limits for Pound in specific?
>>
>>forwarding the HTTPS traffic to apache with mod-ssl works fine for all
>>customers.
>>
>>Is there a way where I can debug this more, better, like dump all post
>>data from the client somewhere or so?
>>
>>Thanks fo input.
>>
>>I'm running Pound 1.7.
>>
>>pound .cfg file:
>>
>>User            nobody
>>Group           nobody
>>#RootJail       /chroot/pound
>>
>>ExtendedHTTP    0
>>
>>WebDAV          0
>>
>>LogLevel        4
>>
>>Alive           10
>>
>>HTTPSHeaders 0 "HTTPS: on"
>>
>>ListenHTTPS *,443 /etc/secure.keytradebank.com.pem
>>ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
>>
>>UrlGroup ".*"
>>BackEnd 10.x.x.x,80,1
>>BackEnd 10.x.x.x,80,1
>>BackEnd 10.x.x.x,80,1
>>BackEnd 10.x.x.x,80,1
>>
>>EndGroup
>>    
>>
>
>I strongly suggest you move to -current, as 1.7 has some known issues and 
>limitations.
>
>I have never yet heard of problems with POST requests - I would be very 
>interested in more details.
>  
>


-- 
Thierry Coopman
Security Coordinator
Keytrade Bank

-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: high load question
Robert Segall <roseg(at)apsis.ch>
2004-07-06 18:07:09 [ SNIP ]
On Tuesday 06 July 2004 17.49, Thierry Coopman wrote:
> it is all that one bug in IE.
>
> keepalive is set to 60 seconds, on the servers it's usually less, so IE
> has to reconnect if the connection was dropped on the server and, when
> it reconnects, it kind of 'forgets' the POST data.
>
> very stupid, I have all of my customers running HTTP/1.0 because of
> that, and I would prefer HTTP/1.1
>
> It would be great to have a similar mechanism as Apache for this sort
> of  requests, where you can specify to dongrade to 1.0 if the Agent is
> MSIE ...
>
> Either way, it's 'resolved' in a way that it works now and we switched
> back to the proxy :)
>
> Thanks

Glad to hear it's working now. Given the nature of the problem I suggest you 
have a look at NoHTTPS11 - the directive was introduced especially for IE 
clients...

You may also want to consider playing with the Client timeout.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: high load question
Hrvoje Husic <pound(at)cgn.toonster.de>
2004-07-06 18:09:10 [ SNIP ]
Thierry Coopman schrieb am Dienstag, 6. Juli 2004:

> it is all that one bug in IE.
>>>Alive           10

As you are using apache on the backend side, you might try

Alive 30

or any other number greater than the value in the apache-conf, so
apache closes an idle connection. Otherwise, pound disconnects the
connection to early which confuses some browsers.

The keep-alive-bug is handled by the backend-apache in the known way,
that is apache responds with a HTTP/1.0 response on an HTTP/1.1
request from a broken MSIE.

-- 
Hrvoje Husic


Re: high load question
Robert Segall <roseg(at)apsis.ch>
2004-07-07 14:46:24 [ SNIP ]
On Tuesday 06 July 2004 17.49, Thierry Coopman wrote:
> It would be great to have a similar mechanism as Apache for this sort
> of  requests, where you can specify to dongrade to 1.0 if the Agent is
> MSIE ...

I have just uploaded a new -current. New in this version: you can now define 
"NoHTTPS11 2" (default value) which disables HTTP/1.1 for SSL connections 
only for MSIE clients. Please give it a try and let me know how it works for 
you...
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: high load question
Thierry Coopman <thierry(at)keytradebank.com>
2004-07-07 18:17:00 [ SNIP ]
I just switched to -current after some testing and everything seems to 
be fine.

is there a comprenhensive list of changes between the -current and the 
latest stable available?

I noticed non-blocking connects, rewrite redirects and now this 
noHTTPS11 2 option.

Does the rewrite redirect know it has to rewrite http://my.server.com 
redirects from the backend to https://my.server.com when used as an SSL 
reverse proxy?
This would be grear, I had to wait 2 weeks on the web team to modify the 
redirects on the backend because they were redirecting to http instead 
of https since their script thought no https was used. Now I add an 
extra http header to the request on witch they can decide to redirect to 
https instead of http.

BTW: does the content in the Location header need to be a full URL (as 
in 'prot://server/dir/file') or can it be just '/dir/file' or even just 
'file'.


Robert Segall wrote:

>On Tuesday 06 July 2004 17.49, Thierry Coopman wrote:
>  
>
>>It would be great to have a similar mechanism as Apache for this sort
>>of  requests, where you can specify to dongrade to 1.0 if the Agent is
>>MSIE ...
>>    
>>
>
>I have just uploaded a new -current. New in this version: you can now define 
>"NoHTTPS11 2" (default value) which disables HTTP/1.1 for SSL connections 
>only for MSIE clients. Please give it a try and let me know how it works for 
>you...
>  
>


-- 
Thierry Coopman
Security Coordinator
Keytrade Bank

-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: high load question
Robert Segall <roseg(at)apsis.ch>
2004-07-08 13:55:00 [ SNIP ]
On Wednesday 07 July 2004 18.17, Thierry Coopman wrote:
> I just switched to -current after some testing and everything seems to
> be fine.
>
> is there a comprenhensive list of changes between the -current and the
> latest stable available?

Not as such - just the collection of announcements on the list. The full list 
is available only for stable releases (see the RCS comments).

> I noticed non-blocking connects, rewrite redirects and now this
> noHTTPS11 2 option.

Also a few bug fixes.

> Does the rewrite redirect know it has to rewrite http://my.server.com
> redirects from the backend to https://my.server.com when used as an SSL
> reverse proxy?

Yes. Read the man page and the comments in the source.

> This would be grear, I had to wait 2 weeks on the web team to modify the
> redirects on the backend because they were redirecting to http instead
> of https since their script thought no https was used. Now I add an
> extra http header to the request on witch they can decide to redirect to
> https instead of http.
>
> BTW: does the content in the Location header need to be a full URL (as
> in 'prot://server/dir/file') or can it be just '/dir/file' or even just
> 'file'.

It can be anything.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Memory leak Re: high load question
"Thierry Coopman" <thierry(at)keytradebank.com>
2004-07-09 14:28:07 [ SNIP ]
Hi,

I'm running -current with noHTTPS11 2 setting.

It runs fine for some days and then it starts slowing down because the
machine starts swapping. The machine has 512MB ram.
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7038 nobody    16   0  568m 477m 2744 S  0.3 95.1   1:51.08 pound

on the other machine it was even 1528m memory.

after a restart I get this
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2208 nobody    18   0 51200  17m 2744 R  9.7  3.5   0:11.03 pound


a quick and dirty fix would be to restart the proxy every few hours, but
it's not really a professional solution :)

Another solution would be to add memory to the machine, but pound will
basicly use it all too.

Any idea how I can help to trace down this leak? I'm no hard core C
programmer but I understand the basics, so ...

Thanks!



> On Wednesday 07 July 2004 18.17, Thierry Coopman wrote:
>> I just switched to -current after some testing and everything seems to
>> be fine.
>>
>> is there a comprenhensive list of changes between the -current and the
>> latest stable available?
>
> Not as such - just the collection of announcements on the list. The full
> list  is available only for stable releases (see the RCS comments).
>
>> I noticed non-blocking connects, rewrite redirects and now this
>> noHTTPS11 2 option.
>
> Also a few bug fixes.
>
>> Does the rewrite redirect know it has to rewrite http://my.server.com
>> redirects from the backend to https://my.server.com when used as an
>> SSL reverse proxy?
>
> Yes. Read the man page and the comments in the source.
>
>> This would be grear, I had to wait 2 weeks on the web team to modify
>> the redirects on the backend because they were redirecting to http
>> instead of https since their script thought no https was used. Now I
>> add an extra http header to the request on witch they can decide to
>> redirect to https instead of http.
>>
>> BTW: does the content in the Location header need to be a full URL (as
>> in 'prot://server/dir/file') or can it be just '/dir/file' or even
>> just 'file'.
>
> It can be anything.
> --
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904


-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: Memory leak Re: high load question
Robert Segall <roseg(at)apsis.ch>
2004-07-09 15:06:43 [ SNIP ]
On Friday 09 July 2004 14.28, Thierry Coopman wrote:
> Hi,
>
> I'm running -current with noHTTPS11 2 setting.
>
> It runs fine for some days and then it starts slowing down because the
> machine starts swapping. The machine has 512MB ram.
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  7038 nobody    16   0  568m 477m 2744 S  0.3 95.1   1:51.08 pound
>
> on the other machine it was even 1528m memory.
>
> after a restart I get this
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2208 nobody    18   0 51200  17m 2744 R  9.7  3.5   0:11.03 pound
>
>
> a quick and dirty fix would be to restart the proxy every few hours, but
> it's not really a professional solution :)
>
> Another solution would be to add memory to the machine, but pound will
> basicly use it all too.
>
> Any idea how I can help to trace down this leak? I'm no hard core C
> programmer but I understand the basics, so ...
>
> Thanks!

We've had a few reports of this, but we were never able to pin it down - it 
seems to be very much dependent on the machine/OS/libraries combination. I 
would appreciate any help and information you can offer on it.

If you use SSL you may want to try adding the line

#define clean_all() {   \
    if(be != NULL) { BIO_flush(be); BIO_free_all(be); be = NULL; } \
    if(cl != NULL) { BIO_flush(cl); BIO_free_all(cl); cl = NULL; } \
    if(x509 != NULL) { X509_free(x509); x509 = NULL; } \
}

in http.c, line 510 - this may help somewhat (only the line with x509 is new).

Please let me know.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: Memory leak Re: high load question
"Simon Matter" <simon.matter(at)ch.sauter-bc.com>
2004-07-09 15:35:13 [ SNIP ]
> On Friday 09 July 2004 14.28, Thierry Coopman wrote:
>> Hi,
>>
>> I'm running -current with noHTTPS11 2 setting.
>>
>> It runs fine for some days and then it starts slowing down because the
>> machine starts swapping. The machine has 512MB ram.
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  7038 nobody    16   0  568m 477m 2744 S  0.3 95.1   1:51.08 pound
>>
>> on the other machine it was even 1528m memory.
>>
>> after a restart I get this
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  2208 nobody    18   0 51200  17m 2744 R  9.7  3.5   0:11.03 pound
>>
>>
>> a quick and dirty fix would be to restart the proxy every few hours, but
>> it's not really a professional solution :)
>>
>> Another solution would be to add memory to the machine, but pound will
>> basicly use it all too.
>>
>> Any idea how I can help to trace down this leak? I'm no hard core C
>> programmer but I understand the basics, so ...
>>
>> Thanks!
>
> We've had a few reports of this, but we were never able to pin it down -
> it
> seems to be very much dependent on the machine/OS/libraries combination. I
> would appreciate any help and information you can offer on it.
>
> If you use SSL you may want to try adding the line

And please report back if it helps in your situation.
I ended up with a cron job to restart pound daily -  not a really elegant
solution but works for me. This is on Linux RedHat 7.3.

Simon

>
> #define clean_all() {   \
>     if(be != NULL) { BIO_flush(be); BIO_free_all(be); be = NULL; } \
>     if(cl != NULL) { BIO_flush(cl); BIO_free_all(cl); cl = NULL; } \
>     if(x509 != NULL) { X509_free(x509); x509 = NULL; } \
> }
>
> in http.c, line 510 - this may help somewhat (only the line with x509 is
> new).
>
> Please let me know.
> --
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904
>
>


Re: Memory leak Re: high load question
"Thierry Coopman" <thierry(at)keytradebank.com>
2004-07-09 17:32:59 [ SNIP ]
> On Friday 09 July 2004 14.28, Thierry Coopman wrote:
>> Hi,
>>
>> I'm running -current with noHTTPS11 2 setting.
>>
>> It runs fine for some days and then it starts slowing down because
>> the machine starts swapping. The machine has 512MB ram.
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  7038 nobody    16   0  568m 477m 2744 S  0.3 95.1   1:51.08 pound
>>
>> on the other machine it was even 1528m memory.
>>
>> after a restart I get this
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 2208 nobody    18   0 51200  17m 2744 R  9.7  3.5   0:11.03 pound
>>
>>
>> a quick and dirty fix would be to restart the proxy every few hours,
>> but it's not really a professional solution :)
>>
>> Another solution would be to add memory to the machine, but pound
>> will basicly use it all too.
>>
>> Any idea how I can help to trace down this leak? I'm no hard core C
>> programmer but I understand the basics, so ...
>>
>> Thanks!
>
> We've had a few reports of this, but we were never able to pin it down
> - it  seems to be very much dependent on the machine/OS/libraries
> combination. I  would appreciate any help and information you can
> offer on it.

both machines are identical:
# uname -a
Linux pop 2.6.7-gentoo-r5 #1 SMP Thu Jun 24 23:06:26 Local time zone
must be set--see zic  i686 Intel(R) Pentium(R) 4 CPU 2.40GHz
GenuineIntel GNU/Linux

# openssl version
OpenSSL 0.9.7d 17 Mar 2004

# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-pc-linux-gnu/3.3.3/specs
Configured with: /var/tmp/portage/gcc-3.3.3-r6/work/gcc-3.3.3/configure
--prefix=/usr --bindir=/usr/i386-pc-linux-gnu/gcc-bin/3.3
--includedir=/usr/lib/gcc-lib/i386-pc-linux-gnu/3.3.3/include
--datadir=/usr/share/gcc-data/i386-pc-linux-gnu/3.3
--mandir=/usr/share/gcc-data/i386-pc-linux-gnu/3.3/man
--infodir=/usr/share/gcc-data/i386-pc-linux-gnu/3.3/info --enable-shared
--host=i386-pc-linux-gnu --target=i386-pc-linux-gnu --with-system-zlib
--enable-languages=c,c++ --enable-threads=posix --enable-long-long
--disable-checking --disable-libunwind-exceptions --enable-cstdio=stdio
--enable-version-specific-runtime-libs
--with-gxx-include-dir=/usr/lib/gcc-lib/i386-pc-linux-gnu/3.3.3/include/g++-v3
--with-local-prefix=/usr/local --enable-shared --enable-nls
--without-included-gettext --disable-multilib --enable-__cxa_atexit
--enable-clocale=generic
Thread model: posix
gcc version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2,
pie-8.7.6)

gentoo compile flags:
CFLAGS="-O2 -mcpu=i686 -fomit-frame-pointer -pipe"


>
> If you use SSL you may want to try adding the line
>
> #define clean_all() {   \
>     if(be != NULL) { BIO_flush(be); BIO_free_all(be); be = NULL; } \
> if(cl != NULL) { BIO_flush(cl); BIO_free_all(cl); cl = NULL; } \
> if(x509 != NULL) { X509_free(x509); x509 = NULL; } \
> }

ok, I'll try that next week

> in http.c, line 510 - this may help somewhat (only the line with x509
> is new).
>
> Please let me know.
> --
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904


-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: Memory leak Re: high load question
"Thierry Coopman" <thierry(at)keytradebank.com>
2004-07-09 17:33:26 [ SNIP ]
Hmmm,

I have indeed a lot of connection resets in the error log file.

23048 on one server for today alone untill now.

Jul  9 16:47:19 pop pound: error flush to 195.212.29.67: Connection
reset by peer
Jul  9 16:47:20 pop pound: error flush to 81.11.144.88: Connection reset
by peer
Jul  9 16:47:20 pop pound: error flush to 81.89.100.18: Connection reset
by peer
Jul  9 16:47:20 pop pound: error flush to 81.241.33.110: Connection
reset by peer
Jul  9 16:47:20 pop pound: error flush to 81.164.51.104: Connection
reset by peer
Jul  9 16:47:21 pop pound: error flush to 217.136.215.164: Connection
reset by peer
Jul  9 16:47:21 pop pound: error flush to 81.165.119.77: Connection
reset by peer
Jul  9 16:47:22 pop pound: error flush to 213.119.198.186: Connection
reset by peer



are there other resources I can check that need to be freed...

thanks for the help again !



> On Friday 09 July 2004 14.28, Thierry Coopman wrote:
>> Hi,
>>
>> I'm running -current with noHTTPS11 2 setting.
>>
>> It runs fine for some days and then it starts slowing down because
>> the machine starts swapping. The machine has 512MB ram.
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  7038 nobody    16   0  568m 477m 2744 S  0.3 95.1   1:51.08 pound
>>
>> on the other machine it was even 1528m memory.
>>
>> after a restart I get this
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 2208 nobody    18   0 51200  17m 2744 R  9.7  3.5   0:11.03 pound
>>
>>
>> a quick and dirty fix would be to restart the proxy every few hours,
>> but it's not really a professional solution :)
>>
>> Another solution would be to add memory to the machine, but pound
>> will basicly use it all too.
>>
>> Any idea how I can help to trace down this leak? I'm no hard core C
>> programmer but I understand the basics, so ...
>>
>> Thanks!
>
> We've had a few reports of this, but we were never able to pin it down
> - it  seems to be very much dependent on the machine/OS/libraries
> combination. I  would appreciate any help and information you can
> offer on it.
>
> If you use SSL you may want to try adding the line
>
> #define clean_all() {   \
>     if(be != NULL) { BIO_flush(be); BIO_free_all(be); be = NULL; } \
> if(cl != NULL) { BIO_flush(cl); BIO_free_all(cl); cl = NULL; } \
> if(x509 != NULL) { X509_free(x509); x509 = NULL; } \
> }
>
> in http.c, line 510 - this may help somewhat (only the line with x509
> is new).
>
> Please let me know.
> --
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904


-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: Memory leak Re: high load question
Thierry Coopman <thierry(at)keytradebank.com>
2004-07-12 11:23:20 [ SNIP ]
Robert Segall wrote:

>
>
>We've had a few reports of this, but we were never able to pin it down - it 
>seems to be very much dependent on the machine/OS/libraries combination. I 
>would appreciate any help and information you can offer on it.
>
>If you use SSL you may want to try adding the line
>
>#define clean_all() {   \
>    if(be != NULL) { BIO_flush(be); BIO_free_all(be); be = NULL; } \
>    if(cl != NULL) { BIO_flush(cl); BIO_free_all(cl); cl = NULL; } \
>    if(x509 != NULL) { X509_free(x509); x509 = NULL; } \
>}
>
>in http.c, line 510 - this may help somewhat (only the line with x509 is new).
>
>Please let me know.
>  
>

I have one machine running with this now, let's see how this compares to 
the other machine :)
They both receive about the same number of requests.

Thanks


-- 
Thierry Coopman
Security Coordinator
Keytrade Bank

-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: Memory leak Re: high load question
Thierry Coopman <thierry(at)keytradebank.com>
2004-07-12 15:15:50 [ SNIP ]
Robert Segall wrote:

>
>
>We've had a few reports of this, but we were never able to pin it down - it 
>seems to be very much dependent on the machine/OS/libraries combination. I 
>would appreciate any help and information you can offer on it.
>
>If you use SSL you may want to try adding the line
>
>#define clean_all() {   \
>    if(be != NULL) { BIO_flush(be); BIO_free_all(be); be = NULL; } \
>    if(cl != NULL) { BIO_flush(cl); BIO_free_all(cl); cl = NULL; } \
>    if(x509 != NULL) { X509_free(x509); x509 = NULL; } \
>}
>
>in http.c, line 510 - this may help somewhat (only the line with x509 is new).
>
>Please let me know.
>  
>

OK,

I toyed a bit with Pound and Valgrind. It came up with this after a few 
seconds:

on the server without the patch

==31888== 201824 bytes in 1158 blocks are definitely lost in loss record 
45 of 46
==31888==    at 0x40025692: malloc (vg_replace_malloc.c:153)
==31888==    by 0x4028894E: (within /usr/lib/libcrypto.so.0.9.7)
==31888==
==31888== LEAK SUMMARY:
==31888==    definitely lost: 201896 bytes in 1160 blocks.
==31888==    possibly lost:   2840 bytes in 13 blocks.
==31888==    still reachable: 6630540 bytes in 5708 blocks.
==31888==         suppressed: 200 bytes in 1 blocks.

On the server with the patch

==28492== 186912 bytes in 1068 blocks are definitely lost in loss record 
45 of 46
==28492==    at 0x40025692: malloc (vg_replace_malloc.c:153)
==28492==    by 0x4028894E: (within /usr/lib/libcrypto.so.0.9.7)
==28492==
==28492== LEAK SUMMARY:
==28492==    definitely lost: 186984 bytes in 1070 blocks.
==28492==    possibly lost:   2352 bytes in 9 blocks.
==28492==    still reachable: 5652511 bytes in 5436 blocks.
==28492==         suppressed: 200 bytes in 1 blocks.

So it's some OpenSSL structure that is incorrectly or not freed 
somewhere I guess.

I want to get this resolved, I can't have a service that needs to be 
restarted every day because it eats up all memory...

now let's see, the clean_all define cleans up cl and be BIO structs, but 
there is also a bb BIO struct.

there are some BIO_free_all(bb) calls, but somewhere around line 580 in 
http.c there is a BIO_get_ssl with bb as one of the arguments and then 
somewhere later there is  BIO_new(BIO_f_buffer()) that gets assigned to 
bb, without a free of bb in between. Could this be a problem?

Overall it looks like the memory allocated for ssl is never freed, once 
the SSL pointer is retreived.












-- 
Keytrade Bank accepts no liability for the content of this email. For 
more info please visit http://www.keytradebank.com/maildisclaimer.html

Re: Memory leak Re: high load question
Robert Segall <roseg(at)apsis.ch>
2004-07-12 15:35:46 [ SNIP ]
On Monday 12 July 2004 15.15, Thierry Coopman wrote:
> OK,
>
> I toyed a bit with Pound and Valgrind. It came up with this after a few
> seconds:
>
> on the server without the patch
>
> ==31888== 201824 bytes in 1158 blocks are definitely lost in loss record
> 45 of 46
> ==31888==    at 0x40025692: malloc (vg_replace_malloc.c:153)
> ==31888==    by 0x4028894E: (within /usr/lib/libcrypto.so.0.9.7)
> ==31888==
> ==31888== LEAK SUMMARY:
> ==31888==    definitely lost: 201896 bytes in 1160 blocks.
> ==31888==    possibly lost:   2840 bytes in 13 blocks.
> ==31888==    still reachable: 6630540 bytes in 5708 blocks.
> ==31888==         suppressed: 200 bytes in 1 blocks.
>
> On the server with the patch
>
> ==28492== 186912 bytes in 1068 blocks are definitely lost in loss record
> 45 of 46
> ==28492==    at 0x40025692: malloc (vg_replace_malloc.c:153)
> ==28492==    by 0x4028894E: (within /usr/lib/libcrypto.so.0.9.7)
> ==28492==
> ==28492== LEAK SUMMARY:
> ==28492==    definitely lost: 186984 bytes in 1070 blocks.
> ==28492==    possibly lost:   2352 bytes in 9 blocks.
> ==28492==    still reachable: 5652511 bytes in 5436 blocks.
> ==28492==         suppressed: 200 bytes in 1 blocks.
>
> So it's some OpenSSL structure that is incorrectly or not freed
> somewhere I guess.

Thanks - I'm looking into it.

> I want to get this resolved, I can't have a service that needs to be
> restarted every day because it eats up all memory...
>
> now let's see, the clean_all define cleans up cl and be BIO structs, but
> there is also a bb BIO struct.
>
> there are some BIO_free_all(bb) calls, but somewhere around line 580 in
> http.c there is a BIO_get_ssl with bb as one of the arguments and then
> somewhere later there is  BIO_new(BIO_f_buffer()) that gets assigned to
> bb, without a free of bb in between. Could this be a problem?

No. The various BIO structures are pushed (chained) on top of each other. The 
clean_all() macro calls BIO_free_all(), which is supposed to release the full 
chain.

> Overall it looks like the memory allocated for ssl is never freed, once
> the SSL pointer is retreived.

Given that the SSL structure is allocated as part of the BIO I assume it is 
also released as part of the BIO. I'll look into it again.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

MailBoxer