/ Zope / Apsis / Pound Mailing List / Archive / 2006 / 2006-10 / busy mongrels and load balancing

[ << ] [ >> ]

[ https / ssl pass-through / "david ... ] [ pcreposix and OSX / Ruben Kerkhof ... ]

busy mongrels and load balancing
Jon Garvin <jgarvin.lists(at)gmail.com>
2006-10-09 19:44:53 [ FULL ]
There's a thread from this weekend on the RailsTalk mailing list (
http://www.ruby-forum.com/topic/83886
) and one particular response (
Posted by Frederick Cheung <http://www.ruby-forum.com/user/show/2984>
on
07.10.2006 20:15 ) got me thinking.  To summarize, some people seem
concerned that, when one mongrel process is busy with some long task,
load balancers (Pound was *not* specifically mentioned, but a few other
setups were) will still try to send requests to it, even if other
mongrel processes are free.  This seems to fly in the face of what I
understand load balancers are supposed to do.
So, my question is, how would Pound handle this scenario?  Say there are
three mongrel web server processes (A, B, and C)  handling requests that
are filtered through Pound.  Mongrel A gets a  difficult request (#1)
that's going to take several seconds to process before returning a
reply.  Several more easy requests (#'s 2, 3 and 4) come in during this
time.  Let's assume pound sends #'s 2 and 3 to Mongrels B and C and they
quickly return, yet mongrel A is still chugging away on request #1. 
Will request #4 get sent to Mongrel A, just because it's his turn, or is
Pound smart enough to know A is still working, skip it and send the
request directly to B?
Thanks, and my apologies if this is an obscenely dumb question.
Attachments:  
text.html text/html 1729 Bytes

Re: [Pound Mailing List] busy mongrels and load balancing
Ted Dunning <tdunning(at)veoh.com>
2006-10-09 21:07:26 [ FULL ]
Responding to this sort of thing instantly can be difficult for load
balancers.

Generally the solution that has been proposed has been to use "least
connections" balancing.  In your example, as long as one of the fast
requests has cleared out of B or C when the next request comes in, things
will be good.  If B and C are still busy at the moment that the next
transaction comes in, however, there is a good chance that somebody will get
stuck behind the slowpoke.

In reality, what happens is that if you have some obscenely long
transactions, you will shortly have all three servers working on long
transactions.  As soon as one of the long transactions finishes, all of the
backlog behind it will clear out, but until then, the most recently goobered
(MRG) server will have the least pending transactions so all new requests
will be sent to it.  This is, in some ways, the worst thing to do since the
MRG server is probably going to be the last one that clears out.

Another issue is that your average response time is not a particularly good
measure of this problem.  It is generally better to look at some percentile
measure of response time.  Thus, if 1% of your transactions take 5 seconds
and 99% take 10 ms, then you probably want to look at the 2, 3, 5 and 10th
percentile response times in order to make sure that you don't have fast
transactions backing up on you.

One way to help with the problem is to stack multiple services on individual
machines.  This helps by increasing the number of backends that can handle
requests which decreases the chance that every backend gets stuck on a
goober request at the same time.  Increasing the maximum allowable pending
thread count has the same effect.

If you have *any* way to predict which transactions are going to be long and
which ones short, then it would probably help you to segregate your traffic.
Limit the slow transactions to just some of your servers and keep a few free
for fast stuff.  Even if you send fast transactions to all of your servers,
you will still have a car-pool lane that will stay fast moving if you wind
up with all of the goober-handling servers stuck for a bit.  The fast
transactions that pile up behind the goobers will still be slow, but the
fast lane servers will start getting all the rest of the load as soon as the
queues on the goober servers fill up a bit.

Note that this really has almost nothing to do with having some fast and
some slow servers.  It has everything to do with having nasty response time
distributions.  Long tailed distributions break queueing models that were
based on simpler distributions.


On 10/9/06 10:44 AM, "Jon Garvin" <jgarvin.lists(at)gmail.com> wrote:
[...]

Re: [Pound Mailing List] busy mongrels and load balancing
Jon Garvin <jgarvin.lists(at)gmail.com>
2006-10-09 22:11:19 [ FULL ]
Thanks Ted for the all info. Lemme see if I understand one part correctly.

If, from my example, either servers B or C are completely done with
their last request and waiting for work again when request #4 comes in,
then #4 will go directly to a free server. BUT, if all 3 servers happen
to be busy, then Pound (or any other load balancer) has no idea which
will finish next and does *not* queue request #4 internally, waiting to
see which server will free up first? Instead it sends #4 off to
whichever of the 3 servers if feels like, even if one of the other
servers is about to be free and the chosen one is going to be busy for a
while yet. Right?

Sounds to like a problem solved by A) making sure one has enough backend
servers available, and B) not writing your app to include any
unnecessarily long processes. Does Pound write anything to the log to
indicate when it finds all the backend servers saturated?

Ted Dunning wrote:[...][...][...]
Attachments:  
text.html text/html 5682 Bytes

Re: [Pound Mailing List] busy mongrels and load balancing
Ted Dunning <tdunning(at)veoh.com>
2006-10-10 00:11:26 [ FULL ]
On 10/9/06 1:11 PM, "Jon Garvin" <jgarvin.lists(at)gmail.com> wrote:
[...]

Pretty much.  The big issue is that virtually all servers can handle
multiple requests simultaneously and indeed MUST do this in order to get
maximum throughput.  Thus queuing internally would be really, really bad.

[...]

Right.

[...]

Or make sure that you have enough threads.
[...]

If short requests can leap-frog long-running requests and if you have more
threads than you will ever need, then it may be that you just have to make
sure that the total work is less than the total throughput you can maintain.
[...]

Define saturated.  Pound generally decides a server is (over) saturated when
it starts taking forever to respond to requests.  Haproxy has an interesting
trick in that you can define a maximum number of live connections any server
can handle.  When you reach that limit, it queues the connection internal
similar to what you mentioned above.

Re: [Pound Mailing List] busy mongrels and load balancing
Jon Garvin <jgarvin.lists(at)gmail.com>
2006-10-10 00:30:50 [ FULL ]
Got it. Thanks for the info. I've made my quota of new things to learn
for the week, and it's only Monday.

MailBoxer