/ Zope / Apsis / Pound Mailing List / Archive / 2006 / 2006-11 / Uneven balancing with 2.1.6

[ << ] [ >> ]

[ Feature Request: configuration includes / Blake ... ] [ Session type IP not working in Pound 2.1.6 / ... ]

Uneven balancing with 2.1.6
Steve <spm(at)fostam.franken.de>
2006-11-18 12:08:24 [ SNIP ]
Hello,

Recently I've experienced quite uneven balancing with pound 2.1.6.

When distributing among 3 servers (all w/o priority, i.e. should be
equal round robin), there are some servers getting more than twice as
much requests as others. After a short period of time, the situation is
vice versa (the server getting least requests now gets most), and so on.

First I thought it was because pound moved from a single CPU to a
multiple CPU system (and by that was upgraded to 2.1.6).

Now I've read that in 2.1.5 "dynamic rescaling" has been introduced. So
I switched back to 2.1.4, and now the problem seems to be solved.

So, could my problem be caused by the dynamic rescaling? If yes, is
there any configuration option to disable it in the most recent versions
(haven't found one)? Or will I have to stick to 2.1.4?

Thanks in advance,
Steve

Re: [Pound Mailing List] Uneven balancing with 2.1.6
"Yves Junqueira" <yves.junqueira(at)gmail.com>
2006-11-19 22:33:45 [ SNIP ]
On 11/18/06, Steve <spm(at)fostam.franken.de> wrote:
>
> Hello,
>
> Recently I've experienced quite uneven balancing with pound 2.1.6.
>
> When distributing among 3 servers (all w/o priority, i.e. should be
> equal round robin), there are some servers getting more than twice as
> much requests as others. After a short period of time, the situation is
> vice versa (the server getting least requests now gets most), and so on.
>
> First I thought it was because pound moved from a single CPU to a
> multiple CPU system (and by that was upgraded to 2.1.6).
>
> Now I've read that in 2.1.5 "dynamic rescaling" has been introduced. So
> I switched back to 2.1.4, and now the problem seems to be solved.
>
> So, could my problem be caused by the dynamic rescaling? If yes, is
> there any configuration option to disable it in the most recent versions
> (haven't found one)? Or will I have to stick to 2.1.4?
>

Hi,

I've recently submitted a bug report to debian about this. See
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=399086

I believe that:

"the idea is that the order backends are tested affects the chance
they are picked up - backends tested first has a much bigger chance to
be used. IMO, a real weighted round robin scheme should be used."

Robert, could you shed some light on this? I'd like to help fixing
that, if you agree it's an issue.
-- 
Yves Junqueira
http://www.cetico.org - yves.junqueira(at)gmail.com
Brasília, Brasil

Re: [Pound Mailing List] Uneven balancing with 2.1.6
Steve <spm(at)fostam.franken.de>
2006-11-21 00:53:25 [ SNIP ]

Yves Junqueira wrote:

> "the idea is that the order backends are tested affects the chance
> they are picked up - backends tested first has a much bigger chance to
> be used. IMO, a real weighted round robin scheme should be used."

Hm... according to my measurements, with 2.1.4 the distribution among
the hosts with same priority is equal (1-2% difference max.),
independent from the order/position.

Only with 2.1.6 the described unbalance occurs.

Steve

Re: [Pound Mailing List] Uneven balancing with 2.1.6
Steve Otto <sotto(at)kaboodle-inc.com>
2006-11-21 02:06:42 [ SNIP ]
Some daily statistics for 2.1.6 on my systems:

Roughly 600k/day.

On 2.0.5:
    on all 3 servers, 33% each

On 2.1.6 (in config file order)
    1st is 48%, 2nd is 9%, 3rd is 43%.


On 11/20/06 3:53 PM, "Steve" <spm(at)fostam.franken.de> wrote:
> 
> 
> Yves Junqueira wrote:
> 
>> "the idea is that the order backends are tested affects the chance
>> they are picked up - backends tested first has a much bigger chance to
>> be used. IMO, a real weighted round robin scheme should be used."
> 
> Hm... according to my measurements, with 2.1.4 the distribution among
> the hosts with same priority is equal (1-2% difference max.),
> independent from the order/position.
> 
> Only with 2.1.6 the described unbalance occurs.
> 
> Steve

 
-- 
SteveO


Re: [Pound Mailing List] Uneven balancing with 2.1.6
Jacques Caron <jc(at)oxado.com>
2006-11-24 00:18:33 [ SNIP ]
Hi,

I've looked into this a bit, and indeed the algorithm used to adjust 
the distribution is apparently a bit too rough. If a server is faster 
than the others, its priority will be continuously increased until 
the server handles nearly all the traffic and probably chokes. We 
actually found out that some of our servers which we thought to be 
exactly identical, both hardware- and software-wise were actually not 
responding at the same speed (still have to find out why!). Obviously 
if you have even slightly different hardware, or a topology that 
might imply different latencies, this might get even worse.

On top of that, if you have a pretty low Alive value (which is the 
interval of priority adjustments), it may start taking decisions with 
very little data to work on (and hence not necessarily accurate), 
increase the priorities of some servers pretty quickly, and the 
others having less traffic will see their stats move to more 
"average" values quite slowly and never come back in the game.

Looking at what the big commercial player (F5's BigIP) does, it seems 
they have quite simple load balancing methods: round robin, weighted 
round robin, "fastest" (which is -involuntarily- what the current 
scheme does), and least connections. Won't help us a lot, though the 
"least connections" might not be the worst idea, as it is obviously 
linked to the response time...

I'm not sure what the "right" approach should be, but the current 
code definitely needs at the very least a switch to turn off the 
adaptive code (we currently just #if 0'd it), and probably:
- some bounds on the priorities that can be set?
- a minimum number of requests before the priorities are adjusted 
(tried that, but that doesn't seem to be enough)
- some way to set relative priorities (e.g. "this server responds 20% 
faster, I'll send it 20% more traffic") rather than just continuously 
increase the priority until the server chokes.

Haven't tried it, but starting with pretty high priority values (are 
they still limited to 1-9?) might help mitigate the drastic priority 
adjustments?

While I'm here, a quick patch for poundctl to show a bit more 
information about the backends:

 > diff -u Pound-2.1.6.orig/poundctl.c Pound-2.1.6/poundctl.c
--- Pound-2.1.6.orig/poundctl.c Sat Nov  4 11:28:55 2006
+++ Pound-2.1.6/poundctl.c      Fri Nov 24 00:14:10 2006
(at)(at) -143,8 +143,9 (at)(at)
                      if(be.disabled < 0)
                          break;
                      if(be.domain == PF_INET)
-                        printf("    %3d. Backend PF_INET %s:%hd 
%s\n", n_be++, inet_ntoa(be.addr.in.sin_addr),
-                            ntohs(be.addr.in.sin_port), be.disabled? 
"*D": "a");
+                       printf("    %3d. Backend PF_INET %15s:%-5hd 
%2s %2s %2d %5d %10.3lf %7.3lf\n", n_be++, inet_ntoa(be.addr.in.sin_addr),
+                           ntohs(be.addr.in.sin_port), be.disabled? 
"*D": "a", be.alive? "a": "*D",
+                           be.priority, be.n_requests, 
be.t_requests/1000, be.t_average/1000);
                      else
                          printf("    %3d. Backend PF_UNIX %s %s\n", 
n_be++, be.addr.un.sun_path,
                              be.disabled? "*D": "");
(at)(at) -168,8 +169,9 (at)(at)
                  if(be.disabled < 0)
                      break;
                  if(be.domain == PF_INET)
-                    printf("    %3d. Backend PF_INET %s:%hd %s\n", 
n_be++, inet_ntoa(be.addr.in.sin_addr),
-                        ntohs(be.addr.in.sin_port), be.disabled? "*D": "a");
+                   printf("    %3d. Backend PF_INET %15s:%-5hd %2s 
%2s %2d %5d %10.3lf %7.3lf\n", n_be++, inet_ntoa(be.addr.in.sin_addr),
+                       ntohs(be.addr.in.sin_port), be.disabled? 
"*D": "a", be.alive? "a": "*D",
+                       be.priority, be.n_requests, 
be.t_requests/1000, be.t_average/1000);
                  else
                      printf("    %3d. Backend PF_UNIX %s %s\n", 
n_be++, be.addr.un.sun_path,
                          be.disabled? "*D": "");

Jacques.

At 02:06 21/11/2006, Steve Otto wrote:
>Some daily statistics for 2.1.6 on my systems:
>
>Roughly 600k/day.
>
>On 2.0.5:
>     on all 3 servers, 33% each
>
>On 2.1.6 (in config file order)
>     1st is 48%, 2nd is 9%, 3rd is 43%.
>
>
>On 11/20/06 3:53 PM, "Steve" <spm(at)fostam.franken.de> wrote:
> >
> >
> > Yves Junqueira wrote:
> >
> >> "the idea is that the order backends are tested affects the chance
> >> they are picked up - backends tested first has a much bigger chance to
> >> be used. IMO, a real weighted round robin scheme should be used."
> >
> > Hm... according to my measurements, with 2.1.4 the distribution among
> > the hosts with same priority is equal (1-2% difference max.),
> > independent from the order/position.
> >
> > Only with 2.1.6 the described unbalance occurs.
> >
> > Steve
>
>
>--
>SteveO
>
>
>--
>To unsubscribe send an email with subject 'unsubscribe' to pound(at)apsis.ch.
>Please contact roseg(at)apsis.ch for questions.
>http://www.apsis.ch/pound/pound_list/archive/2006/2006-11/1163848104000/1164071202000



Re: [Pound Mailing List] Uneven balancing with 2.1.6
Maurice Yarrow <yarrow(at)best.com>
2006-11-24 05:19:55 [ SNIP ]
Hello Jacques

Your wrote that:

> I'm not sure what the "right" approach should be, but the current code 
> definitely needs at the very least a switch to turn off the adaptive 
> code (we currently just #if 0'd i

So, I am using 2.1.6, but would, at this time, like to turn off the 
dynamic load balancing.
Could you please tell me just which lines need to be #if 0'd ?

Thanks,
Maurice Yarrow





Re: [Pound Mailing List] Uneven balancing with 2.1.6
Jacques Caron <jc(at)oxado.com>
2006-11-25 18:42:17 [ SNIP ]
Hi,

At 05:19 24/11/2006, Maurice Yarrow wrote:
So, I am using 2.1.6, but would, at this time, like to turn off the 
dynamic load balancing.
>Could you please tell me just which lines need to be #if 0'd ?

diff -u Pound-2.1.6.orig/svc.c Pound-2.1.6/svc.c
--- Pound-2.1.6.orig/svc.c      Sat Nov  4 11:28:55 2006
+++ Pound-2.1.6/svc.c   Sat Nov 25 18:40:39 2006
(at)(at) -931,6 +931,7 (at)(at)
                  logmsg(LOG_WARNING, "thr_resurect() unlock: %s", 
strerror(ret_val));
          }

+#if 0
          /* scale the back-end priorities */
          for(lstn = listeners; lstn; lstn = lstn->next)
          for(svc = lstn->services; svc; svc = svc->next) {
(at)(at) -1010,6 +1011,7 (at)(at)
              if(ret_val = pthread_mutex_unlock(&svc->mut))
                  logmsg(LOG_WARNING, "thr_resurect() unlock: %s", 
strerror(ret_val));
          }
+#endif
      }
  }

Jacques. 



Re: [Pound Mailing List] Uneven balancing with 2.1.6
Maurice Yarrow <yarrow(at)best.com>
2006-11-25 20:41:14 [ SNIP ]
Jacques

Thanks...

Maurice


Jacques Caron wrote:

> Hi,
>
> At 05:19 24/11/2006, Maurice Yarrow wrote:
> So, I am using 2.1.6, but would, at this time, like to turn off the 
> dynamic load balancing.
>
>> Could you please tell me just which lines need to be #if 0'd ?
>
>
> diff -u Pound-2.1.6.orig/svc.c Pound-2.1.6/svc.c
> --- Pound-2.1.6.orig/svc.c      Sat Nov  4 11:28:55 2006
> +++ Pound-2.1.6/svc.c   Sat Nov 25 18:40:39 2006
> (at)(at) -931,6 +931,7 (at)(at)
>                  logmsg(LOG_WARNING, "thr_resurect() unlock: %s", 
> strerror(ret_val));
>          }
>
> +#if 0
>          /* scale the back-end priorities */
>          for(lstn = listeners; lstn; lstn = lstn->next)
>          for(svc = lstn->services; svc; svc = svc->next) {
> (at)(at) -1010,6 +1011,7 (at)(at)
>              if(ret_val = pthread_mutex_unlock(&svc->mut))
>                  logmsg(LOG_WARNING, "thr_resurect() unlock: %s", 
> strerror(ret_val));
>          }
> +#endif
>      }
>  }
>
> Jacques.
>
>



MailBoxer