|
/
Zope
/
Apsis
/
Pound Mailing List
/
Archive
/
2007
/
2007-11
/
Remove failing server from pool
[
Too many open files / Thorsten Kramm ... ]
[
Strange redirect behaviour (chunked transfer, 302 ... ]
Remove failing server from pool
Michal Taborsky - Internet Mall <michal.taborsky(at)mall.cz> |
2007-11-19 19:21:11 |
[ SNIP ]
|
Hello everyone.
We run pound successfully, but one minor problem I was not able to
solve. When the server is down (not listening on 80) or timing-out, all
is well. It gets removed from the backend pool, works like charm. The
problem is, when there is something wrong with the application (like we
loose connection to the database). Our web application returns 503
Service unavailable header, which in my oppinion is correct behavior.
But pound does not recognize this code and keeps the server in the pool,
resulting in many clients receiving this error page.
Is it possible to make pound to remove this failed server from the pool,
like it received no response? And try it later again.
Thanks for any advice.
--
Michal Táborský
chief systems architect
Internet Mall, a.s.
Internet Mall - obchody, které si oblíbíte
<http://www.MALL.cz>
|
|
|
Re: [Pound Mailing List] Remove failing server from pool
Dave Steinberg <dave(at)redterror.net> |
2007-11-19 20:02:48 |
[ SNIP ]
|
Michal Taborsky - Internet Mall wrote:
> Hello everyone.
>
> We run pound successfully, but one minor problem I was not able to
> solve. When the server is down (not listening on 80) or timing-out, all
> is well. It gets removed from the backend pool, works like charm. The
> problem is, when there is something wrong with the application (like we
> loose connection to the database). Our web application returns 503
> Service unavailable header, which in my oppinion is correct behavior.
> But pound does not recognize this code and keeps the server in the pool,
> resulting in many clients receiving this error page.
>
> Is it possible to make pound to remove this failed server from the pool,
> like it received no response? And try it later again.
I believe you could get something like this to work using the HAport
option pound offers. Build yourself a little custom application that
goes and checks the internal consistency of your application, and
accepts connections if the app is alive, or doesn't accept them if its not.
Not exactly what you were looking for, but an alternative nonetheless.
Plus you can read the output from this program from another, say
something like nagios, and integrate it with a larger monitoring system
to get notified when there are problems.
Regards,
--
Dave Steinberg
http://www.geekisp.com/
http://www.steinbergcomputing.com/
|
|
|
Re: [Pound Mailing List] Remove failing server from pool
Albert <pound(at)alacra.com> |
2007-11-19 20:43:13 |
[ SNIP ]
|
This has been discussed before, and I agree with Dave, and its not the
function of pound to handle application errors. We wrote a small app
for HAPort, which checks the health of our web server. If it sees
problems, it stops listening on that port, and pound takes the machine
out of the loop. We also use the app when we need to take the machine
out of the pool (for reboot or maintenance) -- stopping the app causes
pound to stop forwarding requests.
Dave Steinberg wrote:
> Michal Taborsky - Internet Mall wrote:
>> Hello everyone.
>>
>> We run pound successfully, but one minor problem I was not able to
>> solve. When the server is down (not listening on 80) or timing-out,
>> all is well. It gets removed from the backend pool, works like charm.
>> The problem is, when there is something wrong with the application
>> (like we loose connection to the database). Our web application
>> returns 503 Service unavailable header, which in my oppinion is
>> correct behavior. But pound does not recognize this code and keeps
>> the server in the pool, resulting in many clients receiving this
>> error page.
>>
>> Is it possible to make pound to remove this failed server from the
>> pool, like it received no response? And try it later again.
>
> I believe you could get something like this to work using the HAport
> option pound offers. Build yourself a little custom application that
> goes and checks the internal consistency of your application, and
> accepts connections if the app is alive, or doesn't accept them if its
> not.
>
> Not exactly what you were looking for, but an alternative nonetheless.
> Plus you can read the output from this program from another, say
> something like nagios, and integrate it with a larger monitoring
> system to get notified when there are problems.
>
> Regards,
|
|
|
|
|
Re: [Pound Mailing List] Remove failing server from pool
Gergely CZUCZY <phoemix(at)harmless.hu> |
2007-11-19 20:44:10 |
[ SNIP ]
|
On Mon, Nov 19, 2007 at 02:02:48PM -0500, Dave Steinberg wrote:
> Michal Taborsky - Internet Mall wrote:
> >Hello everyone.
> >We run pound successfully, but one minor problem I was not able to solve.
When the server is down (not listening on 80) or timing-out,
> >all is well. It gets removed from the backend pool, works like charm. The
problem is, when there is something wrong with the
> >application (like we loose connection to the database). Our web application
returns 503 Service unavailable header, which in my
> >oppinion is correct behavior. But pound does not recognize this code and
keeps the server in the pool, resulting in many clients
> >receiving this error page.
> >Is it possible to make pound to remove this failed server from the pool,
like it received no response? And try it later again.
>
> I believe you could get something like this to work using the HAport option
pound offers. Build yourself a little custom application
> that goes and checks the internal consistency of your application, and
accepts connections if the app is alive, or doesn't accept them
> if its not.
>
> Not exactly what you were looking for, but an alternative nonetheless. Plus
you can read the output from this program from another, say
> something like nagios, and integrate it with a larger monitoring system to
get notified when there are problems.
I don't exactly think it's pound's task completely. Think a bit about this
issue. When all the servers lose the DB backend, for example due to a database
failure (Our Dear PHP Coder Messed Up The Schema...), it's not really the
servers'
fault exactly. Other sites could still operate without a scratch on the same
web-backend, and lower-level functionalit could still be achived (we do this
for a few sites, frontpages are working without the DB, newspapers we serve).
Though, some way to signal pound that the application is down at the moment
would still be useful, I agree on that. I mean, the port is open, but not
ready for operation.
Michal, just think a bit on the complexity of this issue. Removing all the
web-backends due to a database failure, is not definitely a bright idea.
Sincerely,
Gergely Czuczy
mailto: gergely.czuczy(at)harmless.hu
--
Weenies test. Geniuses solve problems that arise.
|
|
|
|
|
RE: [Pound Mailing List] Remove failing server from pool
"Chris Morrow" <cmorrow(at)verrus.com> |
2007-11-19 20:55:46 |
[ SNIP ]
|
Hi Michael,
That is a valid concern. The problem is, according to Pound, your web
server is still functioning correctly. I don't know what you could do in
Pound to resolve this. What I suggest is implementing content checking
script and validating web server responses. I have addressed this issue
myself using "wget". It works something like this:
- wget requests a particular URL
- the request is logged to file
- file is parsed for 200, 404, 500, 503 etc...
- if error exists then email notification list
- if no error then sleep
The script cycles through all my web servers, and if one is in error
reports the hostname back. This has worked fairly well and allows me to
resolve web server issues quickly.
Of course the next step would be to have the script 'notify' pound to
remove the offending server. But, I haven't had a chance to implement
that. ;)
Chris Morrow
Systems Administrator
Verrus Mobile Technologies Inc.
-----Original Message-----
From: Dave Steinberg [mailto:dave(at)redterror.net]
Sent: Monday, November 19, 2007 11:03 AM
To: pound(at)apsis.ch
Subject: Re: [Pound Mailing List] Remove failing server from pool
Michal Taborsky - Internet Mall wrote:
> Hello everyone.
>
> We run pound successfully, but one minor problem I was not able to
> solve. When the server is down (not listening on 80) or timing-out,
all
> is well. It gets removed from the backend pool, works like charm. The
> problem is, when there is something wrong with the application (like
we
> loose connection to the database). Our web application returns 503
> Service unavailable header, which in my oppinion is correct behavior.
> But pound does not recognize this code and keeps the server in the
pool,
> resulting in many clients receiving this error page.
>
> Is it possible to make pound to remove this failed server from the
pool,
> like it received no response? And try it later again.
I believe you could get something like this to work using the HAport
option pound offers. Build yourself a little custom application that
goes and checks the internal consistency of your application, and
accepts connections if the app is alive, or doesn't accept them if its
not.
Not exactly what you were looking for, but an alternative nonetheless.
Plus you can read the output from this program from another, say
something like nagios, and integrate it with a larger monitoring system
to get notified when there are problems.
Regards,
--
Dave Steinberg
http://www.geekisp.com/
http://www.steinbergcomputing.com/
--
To unsubscribe send an email with subject unsubscribe to pound(at)apsis.ch.
Please contact roseg(at)apsis.ch for questions.
|
|
|
Re: [Pound Mailing List] Remove failing server from pool
Michal Taborsky - Internet Mall <michal.taborsky(at)mall.cz> |
2007-11-21 10:16:24 |
[ SNIP ]
|
Thanks to everybody for suggestions. In my particular case I still think
the approach I suggested is valid, but I understand that it may not be
the case everywhere. Still, it would be nice if there was an option in
the backend configuration section, something like RemoveOn50x = true.
I'll explain, why in our case, this would help. Our app is using pgpool
for database connection pooling and load balancing. The pgpool runs on
every backend (because of the load balancing). Unfortunately, it has a
bad habit of "freezing" once in a while (actually lot more, than we'd
like, but I don't know about anything better for the job). Now, we have
a monitoring script, which runs every minute and if it finds the pgpool
dead, it kills it and reloads. But for one minute, the backend sends
503s to everyone using this backend. With 30 req/s thats 1800 "pissed"
customers (we are on-line retailer, so we have to be extra nice to them
:) ).
If pound were to remove the backend on the first 503, we'd be able to
eliminate this problem. The pgpool will reload later, pound will realize
it and return it to the backend pool. All's well that ends well.
I agree, that if the database itself fails (which is unlikely, because
there are two of them), all backends will be removed. But I don't care,
because in this unlikely event, I can't serve anything anyway, so I have
no problem with pound sending the 503s instead of the appservers.
But again, thanks everyone.
MT.
Dave Steinberg napsal(a):
> Michal Taborsky - Internet Mall wrote:
>> Hello everyone.
>>
>> We run pound successfully, but one minor problem I was not able to
>> solve. When the server is down (not listening on 80) or timing-out,
>> all is well. It gets removed from the backend pool, works like charm.
>> The problem is, when there is something wrong with the application
>> (like we loose connection to the database). Our web application
>> returns 503 Service unavailable header, which in my oppinion is
>> correct behavior. But pound does not recognize this code and keeps the
>> server in the pool, resulting in many clients receiving this error page.
>>
>> Is it possible to make pound to remove this failed server from the
>> pool, like it received no response? And try it later again.
>
> I believe you could get something like this to work using the HAport
> option pound offers. Build yourself a little custom application that
> goes and checks the internal consistency of your application, and
> accepts connections if the app is alive, or doesn't accept them if its not.
>
> Not exactly what you were looking for, but an alternative nonetheless.
> Plus you can read the output from this program from another, say
> something like nagios, and integrate it with a larger monitoring system
> to get notified when there are problems.
>
> Regards,
--
Michal Táborský
chief systems architect
Internet Mall, a.s.
Internet Mall - obchody, které si oblíbíte
<http://www.MALL.cz>
|
|
|
Re: [Pound Mailing List] Remove failing server from pool
Dave Steinberg <dave(at)redterror.net> |
2007-11-21 16:37:11 |
[ SNIP ]
|
Michal Taborsky - Internet Mall wrote:
> Thanks to everybody for suggestions. In my particular case I still think
> the approach I suggested is valid, but I understand that it may not be
> the case everywhere. Still, it would be nice if there was an option in
> the backend configuration section, something like RemoveOn50x = true.
>
> I'll explain, why in our case, this would help. Our app is using pgpool
> for database connection pooling and load balancing. The pgpool runs on
> every backend (because of the load balancing). Unfortunately, it has a
> bad habit of "freezing" once in a while (actually lot more, than we'd
> like, but I don't know about anything better for the job). Now, we have
> a monitoring script, which runs every minute and if it finds the pgpool
> dead, it kills it and reloads. But for one minute, the backend sends
> 503s to everyone using this backend. With 30 req/s thats 1800 "pissed"
> customers (we are on-line retailer, so we have to be extra nice to them
> :) ).
<snip>
re: pgpool - that stinks. :(
IMHO, the source code for Pound is extremely approachable. You should
seriously consider writing a patch to implement this feature, and if it
works out well for you, submit it back to the community. Even if its
not accepted, my guess is that maintaining the patch across pound's
release cycle will be an approachable task, since the code isn't being
majorly rewritten or anything.
You might also, um, fix pgpool. If I were writing a patch, that's where
I'd spend my time since its the root of the problem. The big challenge
there is of course writing the test suite to reliably recreate the bug.
Good luck,
--
Dave Steinberg
http://www.geekisp.com/
http://www.steinbergcomputing.com/
|
|
|
|