/ Zope / Apsis / Pound Mailing List / Archive / 2007 / 2007-11 / Remove failing server from pool

[ << ] [ >> ]

[ Too many open files / Thorsten Kramm ... ] [ Strange redirect behaviour (chunked transfer, 302 ... ]

Remove failing server from pool
Michal Taborsky - Internet Mall <michal.taborsky(at)mall.cz>
2007-11-19 19:21:11 [ FULL ]
Hello everyone.

We run pound successfully, but one minor problem I was not able to 
solve. When the server is down (not listening on 80) or timing-out, all 
is well. It gets removed from the backend pool, works like charm. The 
problem is, when there is something wrong with the application (like we 
loose connection to the database). Our web application returns 503 
Service unavailable header, which in my oppinion is correct behavior. 
But pound does not recognize this code and keeps the server in the pool, 
resulting in many clients receiving this error page.

Is it possible to make pound to remove this failed server from the pool, 
like it received no response? And try it later again.

Thanks for any advice.
[...]

Re: [Pound Mailing List] Remove failing server from pool
Dave Steinberg <dave(at)redterror.net>
2007-11-19 20:02:48 [ FULL ]
Michal Taborsky - Internet Mall wrote:[...]

I believe you could get something like this to work using the HAport 
option pound offers.  Build yourself a little custom application that 
goes and checks the internal consistency of your application, and 
accepts connections if the app is alive, or doesn't accept them if its not.

Not exactly what you were looking for, but an alternative nonetheless. 
Plus you can read the output from this program from another, say 
something like nagios, and integrate it with a larger monitoring system 
to get notified when there are problems.

Regards,[...]

Re: [Pound Mailing List] Remove failing server from pool
Albert <pound(at)alacra.com>
2007-11-19 20:43:13 [ FULL ]
This has been discussed before, and I agree with Dave, and its not the 
function of pound to handle application errors.  We wrote a small app 
for HAPort, which checks the health of our web server.  If it sees 
problems, it stops listening on that port, and pound takes the machine 
out of the loop.  We also use the app when we need to take the machine 
out of the pool (for reboot or maintenance) -- stopping the app causes 
pound to stop forwarding requests.

Dave Steinberg wrote:[...][...][...]
Attachments:  
text.html text/html 2343 Bytes

Re: [Pound Mailing List] Remove failing server from pool
Gergely CZUCZY <phoemix(at)harmless.hu>
2007-11-19 20:44:10 [ FULL ]
On Mon, Nov 19, 2007 at 02:02:48PM -0500, Dave Steinberg wrote:[...]
I don't exactly think it's pound's task completely. Think a bit about this
issue. When all the servers lose the DB backend, for example due to a database
failure (Our Dear PHP Coder Messed Up The Schema...), it's not really the
servers'
fault exactly. Other sites could still operate without a scratch on the same
web-backend, and lower-level functionalit could still be achived (we do this
for a few sites, frontpages are working without the DB, newspapers we serve).

Though, some way to signal pound that the application is down at the moment
would still be useful, I agree on that. I mean, the port is open, but not
ready for operation.

Michal, just think a bit on the complexity of this issue. Removing all the
web-backends due to a database failure, is not definitely a bright idea.

Sincerely,

Gergely Czuczy
mailto: gergely.czuczy(at)harmless.hu
[...]
Attachments:  
application.pgp-signature application/pgp-signature 2101 Bytes

RE: [Pound Mailing List] Remove failing server from pool
"Chris Morrow" <cmorrow(at)verrus.com>
2007-11-19 20:55:46 [ FULL ]
Hi Michael,

That is a valid concern. The problem is, according to Pound, your web
server is still functioning correctly. I don't know what you could do in
Pound to resolve this. What I suggest is implementing content checking
script and validating web server responses. I have addressed this issue
myself using "wget". It works something like this:

- wget requests a particular URL
- the request is logged to file
- file is parsed for 200, 404, 500, 503 etc...
- if error exists then email notification list
- if no error then sleep

The script cycles through all my web servers, and if one is in error
reports the hostname back. This has worked fairly well and allows me to
resolve web server issues quickly.

Of course the next step would be to have the script 'notify' pound to
remove the offending server. But, I haven't had a chance to implement
that. ;)

Chris Morrow
Systems Administrator
Verrus Mobile Technologies Inc.

-----Original Message-----
From: Dave Steinberg [mailto:dave(at)redterror.net] 
Sent: Monday, November 19, 2007 11:03 AM
To: pound(at)apsis.ch
Subject: Re: [Pound Mailing List] Remove failing server from pool

Michal Taborsky - Internet Mall wrote:[...]
all [...]
we [...]
pool, [...]
pool, [...]

I believe you could get something like this to work using the HAport 
option pound offers.  Build yourself a little custom application that 
goes and checks the internal consistency of your application, and 
accepts connections if the app is alive, or doesn't accept them if its
not.

Not exactly what you were looking for, but an alternative nonetheless. 
Plus you can read the output from this program from another, say 
something like nagios, and integrate it with a larger monitoring system 
to get notified when there are problems.

Regards,[...]

Re: [Pound Mailing List] Remove failing server from pool
Michal Taborsky - Internet Mall <michal.taborsky(at)mall.cz>
2007-11-21 10:16:24 [ FULL ]
Thanks to everybody for suggestions. In my particular case I still think 
the approach I suggested is valid, but I understand that it may not be 
the case everywhere. Still, it would be nice if there was an option in 
the backend configuration section, something like RemoveOn50x = true.

I'll explain, why in our case, this would help. Our app is using pgpool 
for database connection pooling and load balancing. The pgpool runs on 
every backend (because of the load balancing). Unfortunately, it has a 
bad habit of "freezing" once in a while (actually lot more, than we'd 
like, but I don't know about anything better for the job). Now, we have 
a monitoring script, which runs every minute and if it finds the pgpool 
dead, it kills it and reloads. But for one minute, the backend sends 
503s to everyone using this backend. With 30 req/s thats 1800 "pissed" 
customers (we are on-line retailer, so we have to be extra nice to them 
:) ).

If pound were to remove the backend on the first 503, we'd be able to 
eliminate this problem. The pgpool will reload later, pound will realize 
it and return it to the backend pool. All's well that ends well.

I agree, that if the database itself fails (which is unlikely, because 
there are two of them), all backends will be removed. But I don't care, 
because in this unlikely event, I can't serve anything anyway, so I have 
no problem with pound sending the 503s instead of the appservers.

But again, thanks everyone.

MT.

Dave Steinberg napsal(a):[...][...][...]

[...]

Re: [Pound Mailing List] Remove failing server from pool
Dave Steinberg <dave(at)redterror.net>
2007-11-21 16:37:11 [ FULL ]
Michal Taborsky - Internet Mall wrote:[...]

<snip>

re: pgpool - that stinks.  :(

IMHO, the source code for Pound is extremely approachable.  You should 
seriously consider writing a patch to implement this feature, and if it 
works out well for you, submit it back to the community.  Even if its 
not accepted, my guess is that maintaining the patch across pound's 
release cycle will be an approachable task, since the code isn't being 
majorly rewritten or anything.

You might also, um, fix pgpool.  If I were writing a patch, that's where 
I'd spend my time since its the root of the problem.  The big challenge 
there is of course writing the test suite to reliably recreate the bug.

Good luck,[...]

MailBoxer