/ Zope / Apsis / Pound Mailing List / Archive / 2006 / 2006-02 / Connection timed out

[ << ] [ >> ]

[ Followup - Pound 2.0.1 on VMware ESX? / Jeffrey ... ] [ Bad URL / "Dean" ... ]

Connection timed out
"Cory Omand" <coryomand(at)gmail.com>
2006-02-23 20:13:12 [ FULL ]
Hi,

I am using Pound 2.0.2 (Solaris 10 SPARC) to load balance access to a set of
hosts running Subversion 1.3.0 (mod_dav_svn/mod_authz_svn under Apache 2.2.0,
Solaris 10 SPARC).  I am seeing some strange behavior with relatively large
checkouts from subversion.  So far, the most I have been able to check out
without errors is 4.7MB, so I'm not running into a filesize limit.  After a
seemingly random amount of time/data during the checkout, I get the
following warning from pound:

Feb 23 10:35:09 host.domain pound: [ID 702911 daemon.warning] error copy
chunk cont: Connection timed out

On the worker, I see the following in the Apache debug error log:

[Thu Feb 23 10:35:09 2006] [info] [client X.X.X.X] (32)Broken pipe:
core_output_filter: writing data to the network
[Thu Feb 23 10:35:09 2006] [info] [client X.X.X.X] (32)Broken pipe:
core_output_filter: writing data to the network
[Thu Feb 23 10:35:09 2006] [error] [client X.X.X.X] Provider encountered an
error while streaming a REPORT response.  [500, #0]
[Thu Feb 23 10:35:09 2006] [error] [client X.X.X.X] A failure occurred while
driving the update report editor  [500, #190004]
[Thu Feb 23 10:35:09 2006] [info] [client X.X.X.X] (32)Broken pipe:
core_output_filter: writing data to the network

A truss of the pound process shows the following behavior immediately before
the client connection is shut down.  Lwp 34 below is the process which
successfully passed data from the worker to the client for ~40s (2.6MB of
data transmitted prior to the timeout).  Lwp 2 appears to be the worker
monitoring thread.

4060/34:        64.0766 pollsys(0xFEE0A3D4, 1, 0xFEE0A360, 0x00000000)  = 1
4060/34:        64.0767 write(0, "\r\n 1 b e f e\r\n b C V".., 2057)    =
2057
4060/34:        64.0770 pollsys(0xFEE0A374, 1, 0xFEE0A300, 0x00000000)  = 1
4060/34:        64.0772 read(1, " v u p X\n U / Z / Q f /".., 4096)     =
4096
4060/34:        pollsys(0xFEE0A3D4, 1, 0xFEE0A360, 0x00000000) (sleeping...)
4060/2:         68.7982 nanosleep(0xFEF2FEF0, 0xFEF2FEE8)               = 0
4060/2:         68.7983 time()                                          =
1140721552
4060/2:         68.7985 time()                                          =
1140721552
4060/2:         nanosleep(0xFEF2FEF0, 0xFEF2FEE8) (sleeping...)
4060/34:        74.0782 pollsys(0xFEE0A3D4, 1, 0xFEE0A360, 0x00000000)  = 0
4060/34:        74.0784 fstat(3, 0xFEE0A250)                            = 0
4060/34:        74.0786 time()                                          =
1140721557
4060/34:        74.0789 putmsg(3, 0xFEE09908, 0xFEE098FC, 0)            = 0
4060/34:        74.0790 open("/var/run/syslog_door", O_RDONLY)          = 2
4060/34:        74.0792 door_info(2, 0xFEE09840)                        = 0
4060/34:        74.0794 getpid()                                        =
4060 [4059]
4060/34:        74.0796 door_call(2, 0xFEE09828)                        = 0
4060/34:        74.0797 close(2)                                        = 0
4060/34:        74.0801 shutdown(1, SHUT_RDWR, SOV_DEFAULT)             = 0
4060/34:        74.0806 close(1)                                        = 0
4060/34:        74.0808 shutdown(0, SHUT_RDWR, SOV_DEFAULT)             = 0
4060/34:        75.3392 close(0)                                        = 0
4060/34:        75.3394 lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) =
0xFFBFFEFF [0x0000FFFF]
4060/34:        75.3397 lwp_exit()

Note that checkouts done directly from this worker complete without errors,
but when redirected through pound, the connection times out.

Initially, I had no Session tracking set up at all for subversion clients.
This meant that discrete requests which may have been part of the same
client operation (e.g. a checkout) would be balanced across the workers.
For debugging, I changed to a Session tracker with Type=IP and TTL=900, to
ensure that Pound is only dealing with a single worker for the duration of a
checkout.  This apparently was not related.

My pound configuration:

User        "nobody"
Group       "nobody"
LogLevel    2

ListenHTTP
    Address X.X.X.X
    Port    90
    xHTTP   1
    WebDAV  1

    Service
        Session
            Type IP
            TTL  900
        End
        BackEnd
            Address A.A.A.A
            Port    80
        End
        BackEnd
            Address B.B.B.B
            Port    80
        End
        BackEnd
            Address C.C.C.C
            Port    80
        End
    End
End

Can someone offer some insight?

Thanks in advance,
Cory.
Attachments:  
text.html text/html 8824 Bytes

Re: Connection timed out
"Cory Omand" <coryomand(at)gmail.com>
2006-02-23 20:24:05 [ FULL ]
On 2/23/06, Cory Omand <coryomand(at)gmail.com> wrote:[...]


Additional information -- I just tried the same checkout, through Pound,
both with and without Session Type=IP, only from a network host which is
closer to the server.  Previous checkouts were tested from a different
campus within our intranet, so had a higher latency.  The low latency
checkouts completed without any error at all.

I will increase the Client timeout, and see whether the condition goes
away.  I will initially set this to 60 (default = 10) -- is this a
reasonable value?

Regards,
C.
Attachments:  
text.html text/html 1500 Bytes

Re: [Pound Mailing List] Connection timed out
Alessio Cervellin <a.cervellin(at)acm.org>
2006-02-23 21:00:36 [ FULL ]
Cory Omand wrote:[...]

do you have the same behavior with pound 1.10 too? (just curious)

Re: [Pound Mailing List] Connection timed out
"Cory Omand" <coryomand(at)gmail.com>
2006-02-23 21:45:04 [ FULL ]
On 2/23/06, Alessio Cervellin <a.cervellin(at)acm.org> wrote:[...]


I don't know -- I haven't tested with 1.10 at all.  I went right from 1.9 to
2.0.1.  It does appear that the issue was caused by the low setting for
Client.  It always helps to read the manual closely :).  When subversion is
doing a big checkout, the client goes back to the server multiple times to
fetch more data.  If the time between any two of these fetches exceeds the
Client timeout, Pound closes the connection, and the entire checkout
aborts.  I currently have Client set to '300' (default is 10), and I'm not
seeing the problem on the high-latency repository connections anymore.

Regards,
Cory.
Attachments:  
text.html text/html 1409 Bytes

MailBoxer