|
/
Zope
/
Apsis
/
Pound Mailing List
/
Archive
/
2004
/
2004-07
/
high load question
[
Re: Re[2]: privacy / "Claus ... ]
[
Port number in HTTP_HOST / Michael DeGusta ... ]
high load question
Thierry Coopman <thierry(at)keytradebank.com> |
2004-07-06 15:22:24 |
[ FULL ]
|
Hi,
I have setup pound on 2 gentoo servers with LVS as load-balancing and HA
solution. This setup works great under low load conditions.
I have LVS balance HTTPS connections over 2 Gentoo machines running
Pound, that in turn connect to 8 back end servers in HTTP.
I have put it in production yesterday, only to reverse that today after
discovering that some people were unable to log into our site.
We have a login page that needs to send its parameters with POST. From
what I have seen pound forwards these requests, but puts the
Content-Length header to 0, and the form data is effectivly not sent to
the back end.
I have little or no control over the client (mostly Internet Explorer,
different versions, so it's not one specific IE), The data send in the
form is trival, a login and a password without special chars.
Now this behaviour is not there under low load conditions (and even on
high load it's not always the case).
I was wondering if this could be a side effect of higher load. I had up
to 60 requests per second (30 per machine), with some 150 (75)
connections to the back end servers.
Did I bump onto a resource limit? I have no specific error messages
apart from the usual broken pipes and connection resets caused by stupid
IE or proxies. any hints on how I can upp the limits for Pound in specific?
forwarding the HTTPS traffic to apache with mod-ssl works fine for all
customers.
Is there a way where I can debug this more, better, like dump all post
data from the client somewhere or so?
Thanks fo input.
I'm running Pound 1.7.
pound .cfg file:
User nobody
Group nobody
#RootJail /chroot/pound
ExtendedHTTP 0
WebDAV 0
LogLevel 4
Alive 10
HTTPSHeaders 0 "HTTPS: on"
ListenHTTPS *,443 /etc/secure.keytradebank.com.pem
ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
UrlGroup ".*"
BackEnd 10.x.x.x,80,1
BackEnd 10.x.x.x,80,1
BackEnd 10.x.x.x,80,1
BackEnd 10.x.x.x,80,1
EndGroup
[...]
|
|
|
Re: high load question
Robert Segall <roseg(at)apsis.ch> |
2004-07-06 17:35:04 |
[ FULL ]
|
On Tuesday 06 July 2004 15.22, Thierry Coopman wrote:[...]
I strongly suggest you move to -current, as 1.7 has some known issues and
limitations.
I have never yet heard of problems with POST requests - I would be very
interested in more details.[...]
|
|
|
Re: high load question
Thierry Coopman <thierry(at)keytradebank.com> |
2004-07-06 17:49:00 |
[ FULL ]
|
it is all that one bug in IE.
keepalive is set to 60 seconds, on the servers it's usually less, so IE
has to reconnect if the connection was dropped on the server and, when
it reconnects, it kind of 'forgets' the POST data.
very stupid, I have all of my customers running HTTP/1.0 because of
that, and I would prefer HTTP/1.1
It would be great to have a similar mechanism as Apache for this sort
of requests, where you can specify to dongrade to 1.0 if the Agent is
MSIE ...
Either way, it's 'resolved' in a way that it works now and we switched
back to the proxy :)
Thanks
Robert Segall wrote:
[...][...][...]
[...]
|
|
|
Re: high load question
Robert Segall <roseg(at)apsis.ch> |
2004-07-06 18:07:09 |
[ FULL ]
|
On Tuesday 06 July 2004 17.49, Thierry Coopman wrote:[...]
Glad to hear it's working now. Given the nature of the problem I suggest you
have a look at NoHTTPS11 - the directive was introduced especially for IE
clients...
You may also want to consider playing with the Client timeout.[...]
|
|
|
Re: high load question
Hrvoje Husic <pound(at)cgn.toonster.de> |
2004-07-06 18:09:10 |
[ FULL ]
|
Thierry Coopman schrieb am Dienstag, 6. Juli 2004:
[...]
>>>Alive 10
As you are using apache on the backend side, you might try
Alive 30
or any other number greater than the value in the apache-conf, so
apache closes an idle connection. Otherwise, pound disconnects the
connection to early which confuses some browsers.
The keep-alive-bug is handled by the backend-apache in the known way,
that is apache responds with a HTTP/1.0 response on an HTTP/1.1
request from a broken MSIE.
[...]
|
|
|
Re: high load question
Robert Segall <roseg(at)apsis.ch> |
2004-07-07 14:46:24 |
[ FULL ]
|
On Tuesday 06 July 2004 17.49, Thierry Coopman wrote:[...]
I have just uploaded a new -current. New in this version: you can now define
"NoHTTPS11 2" (default value) which disables HTTP/1.1 for SSL connections
only for MSIE clients. Please give it a try and let me know how it works for
you...[...]
|
|
|
Re: high load question
Thierry Coopman <thierry(at)keytradebank.com> |
2004-07-07 18:17:00 |
[ FULL ]
|
I just switched to -current after some testing and everything seems to
be fine.
is there a comprenhensive list of changes between the -current and the
latest stable available?
I noticed non-blocking connects, rewrite redirects and now this
noHTTPS11 2 option.
Does the rewrite redirect know it has to rewrite http://my.server.com
redirects from the backend to https://my.server.com when used as an SSL
reverse proxy?
This would be grear, I had to wait 2 weeks on the web team to modify the
redirects on the backend because they were redirecting to http instead
of https since their script thought no https was used. Now I add an
extra http header to the request on witch they can decide to redirect to
https instead of http.
BTW: does the content in the Location header need to be a full URL (as
in 'prot://server/dir/file') or can it be just '/dir/file' or even just
'file'.
Robert Segall wrote:
[...][...][...]
[...]
|
|
|
Re: high load question
Robert Segall <roseg(at)apsis.ch> |
2004-07-08 13:55:00 |
[ FULL ]
|
On Wednesday 07 July 2004 18.17, Thierry Coopman wrote:[...]
Not as such - just the collection of announcements on the list. The full list
is available only for stable releases (see the RCS comments).
[...]
Also a few bug fixes.
[...]
Yes. Read the man page and the comments in the source.
[...]
It can be anything.[...]
|
|
|
Memory leak Re: high load question
"Thierry Coopman" <thierry(at)keytradebank.com> |
2004-07-09 14:28:07 |
[ FULL ]
|
Hi,
I'm running -current with noHTTPS11 2 setting.
It runs fine for some days and then it starts slowing down because the
machine starts swapping. The machine has 512MB ram.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7038 nobody 16 0 568m 477m 2744 S 0.3 95.1 1:51.08 pound
on the other machine it was even 1528m memory.
after a restart I get this
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2208 nobody 18 0 51200 17m 2744 R 9.7 3.5 0:11.03 pound
a quick and dirty fix would be to restart the proxy every few hours, but
it's not really a professional solution :)
Another solution would be to add memory to the machine, but pound will
basicly use it all too.
Any idea how I can help to trace down this leak? I'm no hard core C
programmer but I understand the basics, so ...
Thanks!
[...][...][...][...][...][...][...][...][...]
[...]
|
|
|
Re: Memory leak Re: high load question
Robert Segall <roseg(at)apsis.ch> |
2004-07-09 15:06:43 |
[ FULL ]
|
On Friday 09 July 2004 14.28, Thierry Coopman wrote:[...]
We've had a few reports of this, but we were never able to pin it down - it
seems to be very much dependent on the machine/OS/libraries combination. I
would appreciate any help and information you can offer on it.
If you use SSL you may want to try adding the line
#define clean_all() { \
if(be != NULL) { BIO_flush(be); BIO_free_all(be); be = NULL; } \
if(cl != NULL) { BIO_flush(cl); BIO_free_all(cl); cl = NULL; } \
if(x509 != NULL) { X509_free(x509); x509 = NULL; } \
}
in http.c, line 510 - this may help somewhat (only the line with x509 is new).
Please let me know.[...]
|
|
|
Re: Memory leak Re: high load question
"Simon Matter" <simon.matter(at)ch.sauter-bc.com> |
2004-07-09 15:35:13 |
[ FULL ]
|
> On Friday 09 July 2004 14.28, Thierry Coopman wrote:[...][...]
And please report back if it helps in your situation.
I ended up with a cron job to restart pound daily - not a really elegant
solution but works for me. This is on Linux RedHat 7.3.
Simon
[...]
|
|
|
Re: Memory leak Re: high load question
"Thierry Coopman" <thierry(at)keytradebank.com> |
2004-07-09 17:32:59 |
[ FULL ]
|
> On Friday 09 July 2004 14.28, Thierry Coopman wrote:[...][...]
both machines are identical:
# uname -a
Linux pop 2.6.7-gentoo-r5 #1 SMP Thu Jun 24 23:06:26 Local time zone
must be set--see zic i686 Intel(R) Pentium(R) 4 CPU 2.40GHz
GenuineIntel GNU/Linux
# openssl version
OpenSSL 0.9.7d 17 Mar 2004
# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-pc-linux-gnu/3.3.3/specs
Configured with: /var/tmp/portage/gcc-3.3.3-r6/work/gcc-3.3.3/configure
--prefix=/usr --bindir=/usr/i386-pc-linux-gnu/gcc-bin/3.3
--includedir=/usr/lib/gcc-lib/i386-pc-linux-gnu/3.3.3/include
--datadir=/usr/share/gcc-data/i386-pc-linux-gnu/3.3
--mandir=/usr/share/gcc-data/i386-pc-linux-gnu/3.3/man
--infodir=/usr/share/gcc-data/i386-pc-linux-gnu/3.3/info --enable-shared
--host=i386-pc-linux-gnu --target=i386-pc-linux-gnu --with-system-zlib
--enable-languages=c,c++ --enable-threads=posix --enable-long-long
--disable-checking --disable-libunwind-exceptions --enable-cstdio=stdio
--enable-version-specific-runtime-libs
--with-gxx-include-dir=/usr/lib/gcc-lib/i386-pc-linux-gnu/3.3.3/include/g++-v3
--with-local-prefix=/usr/local --enable-shared --enable-nls
--without-included-gettext --disable-multilib --enable-__cxa_atexit
--enable-clocale=generic
Thread model: posix
gcc version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2,
pie-8.7.6)
gentoo compile flags:
CFLAGS="-O2 -mcpu=i686 -fomit-frame-pointer -pipe"
[...]
ok, I'll try that next week
[...]
[...]
|
|
|
Re: Memory leak Re: high load question
"Thierry Coopman" <thierry(at)keytradebank.com> |
2004-07-09 17:33:26 |
[ FULL ]
|
Hmmm,
I have indeed a lot of connection resets in the error log file.
23048 on one server for today alone untill now.
Jul 9 16:47:19 pop pound: error flush to 195.212.29.67: Connection
reset by peer
Jul 9 16:47:20 pop pound: error flush to 81.11.144.88: Connection reset
by peer
Jul 9 16:47:20 pop pound: error flush to 81.89.100.18: Connection reset
by peer
Jul 9 16:47:20 pop pound: error flush to 81.241.33.110: Connection
reset by peer
Jul 9 16:47:20 pop pound: error flush to 81.164.51.104: Connection
reset by peer
Jul 9 16:47:21 pop pound: error flush to 217.136.215.164: Connection
reset by peer
Jul 9 16:47:21 pop pound: error flush to 81.165.119.77: Connection
reset by peer
Jul 9 16:47:22 pop pound: error flush to 213.119.198.186: Connection
reset by peer
are there other resources I can check that need to be freed...
thanks for the help again !
[...][...][...]
[...]
|
|
|
Re: Memory leak Re: high load question
Thierry Coopman <thierry(at)keytradebank.com> |
2004-07-12 11:23:20 |
[ FULL ]
|
Robert Segall wrote:
[...]
I have one machine running with this now, let's see how this compares to
the other machine :)
They both receive about the same number of requests.
Thanks
[...]
|
|
|
Re: Memory leak Re: high load question
Thierry Coopman <thierry(at)keytradebank.com> |
2004-07-12 15:15:50 |
[ FULL ]
|
Robert Segall wrote:
[...]
OK,
I toyed a bit with Pound and Valgrind. It came up with this after a few
seconds:
on the server without the patch
==31888== 201824 bytes in 1158 blocks are definitely lost in loss record
45 of 46
==31888== at 0x40025692: malloc (vg_replace_malloc.c:153)
==31888== by 0x4028894E: (within /usr/lib/libcrypto.so.0.9.7)
==31888==
==31888== LEAK SUMMARY:
==31888== definitely lost: 201896 bytes in 1160 blocks.
==31888== possibly lost: 2840 bytes in 13 blocks.
==31888== still reachable: 6630540 bytes in 5708 blocks.
==31888== suppressed: 200 bytes in 1 blocks.
On the server with the patch
==28492== 186912 bytes in 1068 blocks are definitely lost in loss record
45 of 46
==28492== at 0x40025692: malloc (vg_replace_malloc.c:153)
==28492== by 0x4028894E: (within /usr/lib/libcrypto.so.0.9.7)
==28492==
==28492== LEAK SUMMARY:
==28492== definitely lost: 186984 bytes in 1070 blocks.
==28492== possibly lost: 2352 bytes in 9 blocks.
==28492== still reachable: 5652511 bytes in 5436 blocks.
==28492== suppressed: 200 bytes in 1 blocks.
So it's some OpenSSL structure that is incorrectly or not freed
somewhere I guess.
I want to get this resolved, I can't have a service that needs to be
restarted every day because it eats up all memory...
now let's see, the clean_all define cleans up cl and be BIO structs, but
there is also a bb BIO struct.
there are some BIO_free_all(bb) calls, but somewhere around line 580 in
http.c there is a BIO_get_ssl with bb as one of the arguments and then
somewhere later there is BIO_new(BIO_f_buffer()) that gets assigned to
bb, without a free of bb in between. Could this be a problem?
Overall it looks like the memory allocated for ssl is never freed, once
the SSL pointer is retreived.
[...]
|
|
|
Re: Memory leak Re: high load question
Robert Segall <roseg(at)apsis.ch> |
2004-07-12 15:35:46 |
[ FULL ]
|
On Monday 12 July 2004 15.15, Thierry Coopman wrote:[...]
Thanks - I'm looking into it.
[...]
No. The various BIO structures are pushed (chained) on top of each other. The
clean_all() macro calls BIO_free_all(), which is supposed to release the full
chain.
[...]
Given that the SSL structure is allocated as part of the BIO I assume it is
also released as part of the BIO. I'll look into it again.[...]
|
|
|
|