Hi,
I'm having a problem with backends that accept connections, but won't
reply in time or erroneously - sometimes it won't reply headers,
sometimes the body. I understand that this is a problem in the
application, but maybe pound could help avoiding sending erroneous
replies to clients.
So I was taking a look at the source code and, not being an
experienced coder, I have a few questions about the backend failure
detection:
- to remove a server from the pool, there are two functions in svc.c:
kill_be() and disable_be(). Is this correct?
- kill_be() is only triggered when there's a failure while trying to
connect_nb() to the backend, right? So after the connection is
established, pound is happy about it, or not?
- isn't disable_be() used anywhere?
In this case, in one specific hour I'm getting 172 "response errors",
but 7823 successful responses from that backend. It would be nice to
somehow acknowledge that this backend is problematic and deal with
that somehow.
What would probably help in my case is to add some kind of failure
detection somewhere in http.c:1029:
if((headers = get_headers(be, cl, lstn)) == NULL) {
I understand that it would be tricky to handle such 'partial failure'.
One option maybe is to decrease that server's priority in the backend
pool whenever there's a response error. That would diminish load on
that server temporarily. If the error continues to happen, it could
run disable_be() and, finally, kill_be(). The recover algorithm could
also be a problem. Options: wait for manual intervention via poundctl
to add that backend again (not very user friendly, but in my specific
case it would work well); slow-recover (allow only a small number of
simultaneous connections and then gradually increase that).
Of course, this can only work well in a few cases. For many other,
having more backends alive is more important, it could be a bad idea
to even temporarily remove that backend from the pool. So if I were
implementing this, I would make it an easy to disable option.
Does this make any sense?
[...]
|