/ Zope / Apsis / Pound Mailing List / Archive / 2006 / 2006-11 / bottleneck in pound?

[ << ] [ >> ]

[ poundctl feature suggestion: More status info / ... ] [ Problems with 2.1.6 / "Steve Otto" ... ]

bottleneck in pound?
pound(at)alacra.com
2006-11-11 00:37:46 [ FULL ]
Hi,

first I'd like to say thank you for putting our this product. My predecessor
first installed
this product, and we've had pretty good experience with it.

We recently ran into a problem, and I was wondering if anybody else has
encountered it.

We have a pretty simple setup.  We have a Linux box (Kernel 2.6.9-42.0.3.ELsmp
on an x86_64)
on the front end running pound, and have couple of web servers on the backend
running Win2003/IIS 6.  
Pound just does load balancing  between those servers. 

We recently created an html page with 200 tiny images (about 700 bytes each). 
What I'm seeing is
a slow response when I try to get the html page (with all images) via the pound
(anywhere 
15-30 seconds).  When I go directly to the windows box, I'm able to get the
same page in 
1-3 seconds (in my pound config file, I created a service with 1 backend server
-- to make
sure I was going to the same machine).

I've looked into http "keep-alives", and thats not the issue (we're re-using
connections in
both cases).

I tried going directly to the windows box from the linux box running pound
(with wget), 
and I get really fast times (1-3 sec), so its not linux-windows communication
problem.  
When I tried to get the page on the linix box (via pound), it was also very
slow.

So all signs point to pound, but I also ran a different test.  I put a really
large file on
our Windows servers (7MB), and tried to get it directly from the machine and
via pound.  In
both instances the file downloaded in about 30 seconds.

BTW, We're running pound v2.1 (but I tried earlier versions (1.9), and the
latest 2.1.6 as well),
and I get the same results.

Has anybody else seen such issue?

Thanks.

Re: [Pound Mailing List] bottleneck in pound?
"Yves Junqueira" <yves.junqueira(at)gmail.com>
2006-11-12 04:03:59 [ FULL ]
On 11/10/06, pound(at)alacra.com <pound(at)alacra.com> wrote:[...]

You may be having DNS problems. Change the address of the backend to a
numeric IP address, if you are using a hostname.

If that doesn't work, attach an ltrace or strace to the running pound
process - or start one from strace - and see which system call is
taking so long.
[...]

Re: [Pound Mailing List] bottleneck in pound?
Albert <pound(at)alacra.com>
2006-11-12 05:02:03 [ FULL ]
I tried numeric IP address, and it didn't help.  I actually went a step 
further.  I put all of the images on the linux box running pound, and 
then created a
service to go to 127.0.0.1 for images.  I got the same performance.

Ok, I ran both strace and ltrace (got to admit I'm not very 
knowledgeable with linux), and strace didn't give me much info. Here's 
the info from ltrace (with -c -S options)

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 78.68   61.488804    61488804         1 SYS_wait4
 14.46   11.301984      513726        22 RSA_generate_key
  3.21    2.509271         251      9971 pthread_mutex_lock
  3.21    2.508351         251      9971 pthread_mutex_unlock
  0.14    0.105940          53      1977 regexec
  0.13    0.105064          51      2023 pthread_self
  0.04    0.029762         901        33 gethostbyname
  0.02    0.015119          49       304 fgets
  0.02    0.013838        6919         2 SSL_CTX_use_certificate_chain_file
  0.01    0.008012          89        90 regcomp
  0.01    0.006428        3214         2 SSL_CTX_use_PrivateKey_file
  0.01    0.005572          48       114 memset
  0.01    0.004537          50        89 pthread_mutex_init
  0.01    0.004218          50        83 malloc
  0.00    0.003739          49        76 __strtol_internal
  0.00    0.003183          60        53 regfree
  0.00    0.003047          56        54 SYS_poll

Albert

  0.00    0.002552        2552         1 SSL_load_error_strings
  0.00    0.002458        1229         2 SSL_CTX_new
  0.00    0.002085          16       125 SYS_fcntl
  0.00    0.001961          22        87 SYS_read
  0.00    0.001874        1874         1 fork
Yves Junqueira wrote:[...][...][...]

Re: [Pound Mailing List] bottleneck in pound?
Robert Segall <roseg(at)apsis.ch>
2006-11-13 18:57:46 [ FULL ]
On Fri, 2006-11-10 at 18:37 -0500, pound(at)alacra.com wrote:[...]

If you use IIS try reducing the back-end time-outs. On some versions
Microsoft uses a non-standard TCP implementation which messes around
with closing connections until they time-out. Try something like:

BackEnd
	Address 127.0.0.1
	Port 8080
	TimeOut 2
End

and see if it helps.[...]

Re: [Pound Mailing List] bottleneck in pound?
Albert <pound(at)alacra.com>
2006-11-13 20:23:37 [ FULL ]
Robert Segall wrote:[...]
Tried that as well, didn't help.  As I said earlier, I put all of the 
images on the server running pound, and tried getting the images via 
apache (port 8080). I'm getting the same performance.  When I go 
directly to apache, I'm able to get anywhere between 60-80 images a 
second, but via pound its only 6-8 images/sec.  Here's my log output 
(the first request goes to the windows box (192.168.65.120), the rest 
stay locally on the pound server, getting images via apache 
(127.0.0.1:8080) -- you can see images come back really fast .001 sec -- 
I get the same time from iis server.  The "flag" images are about 
600-700 bytes, and as you can see, I'm only getting 8 images per second [...]
Attachments:  
text.html text/html 7049 Bytes

Re: [Pound Mailing List] bottleneck in pound?
Albert <pound(at)alacra.com>
2006-11-15 01:15:05 [ FULL ]
Albert wrote:[...][...][...]
I'm waiting on the poll() from the client, with an average of 0.2 second 
delay (between the last client write and a new client read).  The client 
request was made from a IE.

[pid 18060] 16:35:57.914868 read(8, "GET 
/pubsite/images/flags/gif/BW"..., 4096) = 518
[pid 18060] 16:35:57.914996 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:57.915382 poll([{fd=9, events=POLLIN|POLLPRI}], 1, 0) = 0
[pid 18060] 16:35:57.915651 poll([{fd=9, events=POLLOUT, 
revents=POLLOUT}], 1, 2000) = 1
[pid 18060] 16:35:57.915732 write(9, "GET 
/pubsite/images/flags/gif/BW"..., 551) = 551
[pid 18060] 16:35:57.915866 poll([{fd=9, events=POLLIN|POLLPRI, 
revents=POLLIN}], 1, 2000) = 1
[pid 18060] 16:35:57.916168 read(9, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 4096) = 656
[pid 18060] 16:35:57.916476 poll([{fd=8, events=POLLOUT, 
revents=POLLOUT}], 1, 3000) = 1
[pid 18060] 16:35:57.916579 write(8, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 292) = 292
[pid 18060] 16:35:57.916676 poll([{fd=8, events=POLLOUT, 
revents=POLLOUT}], 1, 3000) = 1
[pid 18060] 16:35:57.916748 write(8, 
"GIF89a\20\0\v\0\325\0\0\0\207\336H\311\364BBB;\304\361"..., 364) = 364
[pid 18060] 16:35:57.916865 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:57.916980 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:57.917101 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:57.917221 sendto(3, "<30>Nov 14 16:35:57 pound: 
wwwv."..., 369, MSG_NOSIGNAL, NULL, 0) = 369
[pid 18060] 16:35:57.917343 poll( <unfinished ...>
[pid 18059] 16:35:57.918567 <... poll resumed> [{fd=10, 
events=POLLIN|POLLPRI, revents=POLLIN}], 1, 3000) = 1
[pid 18059] 16:35:57.918625 read(10, "GET 
/pubsite/images/flags/gif/BN"..., 4096) = 518
[pid 18059] 16:35:57.918742 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:57.919123 poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0
[pid 18059] 16:35:57.919394 poll([{fd=11, events=POLLOUT, 
revents=POLLOUT}], 1,2000) = 1
[pid 18059] 16:35:57.919467 write(11, "GET 
/pubsite/images/flags/gif/BN"..., 551) = 551
[pid 18059] 16:35:57.919592 poll([{fd=11, events=POLLIN|POLLPRI, 
revents=POLLIN}], 1, 2000) = 1
[pid 18059] 16:35:57.919873 read(11, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 4096) = 665
[pid 18059] 16:35:57.920196 poll([{fd=10, events=POLLOUT, 
revents=POLLOUT}], 1,3000) = 1
[pid 18059] 16:35:57.920288 write(10, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 292) = 292
[pid 18059] 16:35:57.920384 poll([{fd=10, events=POLLOUT, 
revents=POLLOUT}], 1,3000) = 1
[pid 18059] 16:35:57.920457 write(10, 
"GIF89a\20\0\v\0\325\0\0\331\204s\343\273\0\331ze\324\302"..., 373) = 373
[pid 18059] 16:35:57.920570 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:57.920684 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:57.920809 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:57.920926 sendto(3, "<30>Nov 14 16:35:57 pound: 
wwwv."..., 36, MSG_NOSIGNAL, NULL, 0) = 369
[pid 18059] 16:35:57.921045 poll([{fd=10, events=POLLIN|POLLPRI, 
revents=POLLIN}], 1, 3000) = 1
[pid 18059] 16:35:58.236029 read(10, "GET 
/pubsite/images/flags/gif/BG"..., 4096) = 518
[pid 18059] 16:35:58.236181 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:58.236572 poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0
[pid 18059] 16:35:58.236892 poll([{fd=11, events=POLLOUT, 
revents=POLLOUT}], 1,2000) = 1
[pid 18059] 16:35:58.236971 write(11, "GET 
/pubsite/images/flags/gif/BG"..., 551) = 551
[pid 18059] 16:35:58.237106 poll([{fd=11, events=POLLIN|POLLPRI, 
revents=POLLIN}], 1, 2000) = 1
[pid 18059] 16:35:58.237479 read(11, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 4096) = 652
[pid 18059] 16:35:58.237789 poll([{fd=10, events=POLLOUT, 
revents=POLLOUT}], 1,3000) = 1
[pid 18059] 16:35:58.237874 write(10, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 292) = 292
[pid 18059] 16:35:58.237969 poll([{fd=10, events=POLLOUT, 
revents=POLLOUT}], 1,3000) = 1
[pid 18059] 16:35:58.238057 write(10, 
"GIF89a\20\0\v\0\325\0\0\370FF\344\345\3456\2546V\203$y"..., 360) = 360
[pid 18059] 16:35:58.238166 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:58.238286 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:58.238396 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18059] 16:35:58.238512 sendto(3, "<30>Nov 14 16:35:58 pound: 
wwwv."..., 369, MSG_NOSIGNAL, NULL, 0) = 369
[pid 18059] 16:35:58.238632 poll( <unfinished ...>
[pid 18060] 16:35:58.240461 <... poll resumed> [{fd=8, 
events=POLLIN|POLLPRI, revents=POLLIN}], 1, 3000) = 1
[pid 18060] 16:35:58.240519 read(8, "GET 
/pubsite/images/flags/gif/BI"..., 4096) = 518
[pid 18060] 16:35:58.240635 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:58.241034 poll([{fd=9, events=POLLIN|POLLPRI}], 1, 0) = 0
[pid 18060] 16:35:58.241298 poll([{fd=9, events=POLLOUT, 
revents=POLLOUT}], 1, 2000) = 1
[pid 18060] 16:35:58.241378 write(9, "GET 
/pubsite/images/flags/gif/BI"..., 551) = 551
[pid 18060] 16:35:58.241503 poll([{fd=9, events=POLLIN|POLLPRI, 
revents=POLLIN}], 1, 2000) = 1
[pid 18060] 16:35:58.241807 read(9, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 4096) = 666
[pid 18060] 16:35:58.242123 poll([{fd=8, events=POLLOUT, 
revents=POLLOUT}], 1, 3000) = 1
[pid 18060] 16:35:58.242215 write(8, "HTTP/1.1 200 OK\r\nDate: Tue, 14 
N"..., 292) = 292
[pid 18060] 16:35:58.242310 poll([{fd=8, events=POLLOUT, 
revents=POLLOUT}], 1, 3000) = 1
[pid 18060] 16:35:58.242382 write(8, 
"GIF89a\20\0\v\0\325\0\0\353\262\263\362\362\362\221\0\0"..., 374) = 374
[pid 18060] 16:35:58.242490 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:58.242604 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:58.242713 stat("/etc/localtime", 
{st_mode=S_IFREG|0644, st_size=1267, ...}) = 0
[pid 18060] 16:35:58.242829 sendto(3, "<30>Nov 14 16:35:58 pound: 
wwwv."..., 369, MSG_NOSIGNAL, NULL, 0) = 369
[pid 18060] 16:35:58.242946 poll([{fd=8, events=POLLIN|POLLPRI, 
revents=POLLIN}], 1, 3000) = 1
[pid 18060] 16:35:58.457451 read(8, "GET 
/pubsite/images/flags/gif/BF"..., 4096) = 518

And this continues. As you can see, when its calling poll() (before 
client read) there is a .2 second lag.  So 2 questions:

1. Why are there 2 connections from the client?  I guess IE make 2 
connections to retrieve content.
2. More importantly, has anybody experienced this before?  With a .2 
second lag for every other image, for 200 images, pound is spending 20 
seconds on the waits alone.  I realize this is most likely not pounds 
fault, but I wonder if there is some sort of IE to pound communication.

Albert

Re: [Pound Mailing List] bottleneck in pound?
Albert <pound(at)alacra.com>
2006-11-17 00:08:30 [ FULL ]
Hi,

as I said in my previous email, I was seeing the bottleneck in the 
poll() function waiting on the client.  However, I wasn't getting the 
delay when I was going to apache directly.  I straced apache, and found 
it was setting TCP_DELAY to 1 and TCP_CORK to 0 (for SOL_TCP) on every 
read from the client and reversing the flags on write.  I did the same 
in pound, and it made my page request via pound as fast as going 
directly to apache.   After playing around with these flags, I found 
that I only need to set the TCP_DELAY on the client connection write 
after SO_LINGER is set.  Here's what I did (on line 436 of http.c - 
v2.1.6):

   n = 1;
   setsockopt(sock, SOL_TCP, TCP_NODELAY, (void *)&n, sizeof(n));

Do you think its a good idea to add it to the code?  I think it should 
at least be available when running "configure".  It definitely speeds up 
requests for pages that have alot of tiny dependent objects.

Albert

Albert wrote:
[...]

Re: [Pound Mailing List] bottleneck in pound?
Blake Barnett <shadoi(at)nanovoid.com>
2006-11-17 02:37:53 [ FULL ]
On Nov 16, 2006, at 3:08 PM, Albert wrote:
[...]

Excellent sleuthing there!  I'll certainly keep this patch handy even  
if it doesn't make it into the upstream.

-Blake

Re: [Pound Mailing List] bottleneck in pound?
Robert Segall <roseg(at)apsis.ch>
2006-11-17 18:34:06 [ FULL ]
On Thu, 2006-11-16 at 18:08 -0500, Albert wrote:[...]

Great work, and many thanks for this. It goes straight into the next
release.

Would it be OK to add your name/address to the contributor's list?[...]

Re: [Pound Mailing List] bottleneck in pound?
Maurice Yarrow <yarrow(at)best.com>
2006-11-17 19:02:07 [ FULL ]
Hello Albert, Blake, Robert

Thought I would send this (annecdotal) note apropos of Albert's
modification.

I put in the two indicated lines into http.c (v2.1.6)
at line 437 (the block below, which would be lines 427-444, is now
the appearance of the modified code containing the added two lines right
after the #ifdef block - indicated with "+" signs).

I have tested the result versus the original on a case similar case to 
indicated:
It is a page of thumbnails (47 of them), actually being served by tomcat 
behind
pound-2.1.6.

Based on empirical evidence so far - I have not put in any timing code - I
would have to say that the modified code does appear faster.   How much ?
I estimate (during repeated observations, caching effects removed) that
the original approximately 1 second load now is about half that time,
maybe a little bit more, but noticeably faster than the original.

Of course, without timing code in, this is to be considered highly
annecdotal, at best.

One request:  If I have not placed the added two lines at the intended
location, could someone please correct me on this.

Thanks

Maurice Yarrow

(modified http.c  v2.1.6  code block at  lines 427-444:)

    n = 1;
    setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, (void *)&n, sizeof(n));
    l.l_onoff = 1;
    l.l_linger = 10;
    setsockopt(sock, SOL_SOCKET, SO_LINGER, (void *)&l, sizeof(l));
                                                                               


#ifdef  TCP_LINGER2
    n = 5;
    setsockopt(sock, SOL_TCP, TCP_LINGER2, (void *)&n, sizeof(n));
#endif
                                                                               


+   n = 1;
+   setsockopt(sock, SOL_TCP, TCP_NODELAY, (void *)&n, sizeof(n));
                                                                               


    cl = NULL;
    be = NULL;
    ssl = NULL;
    x509 = NULL;
                                                                               





Robert Segall wrote:
[...][...][...]

Re: [Pound Mailing List] bottleneck in pound?
Albert <pound(at)alacra.com>
2006-11-17 20:04:32 [ FULL ]
Robert Segall wrote:[...][...][...]
yes. And thanks for the quick turnaround.
Attachments:  
text.html text/html 1654 Bytes

MailBoxer