Hi,
I am using Pound 2.0.2 (Solaris 10 SPARC) to load balance access to a set of
hosts running Subversion 1.3.0 (mod_dav_svn/mod_authz_svn under Apache 2.2.0,
Solaris 10 SPARC). I am seeing some strange behavior with relatively large
checkouts from subversion. So far, the most I have been able to check out
without errors is 4.7MB, so I'm not running into a filesize limit. After a
seemingly random amount of time/data during the checkout, I get the
following warning from pound:
Feb 23 10:35:09 host.domain pound: [ID 702911 daemon.warning] error copy
chunk cont: Connection timed out
On the worker, I see the following in the Apache debug error log:
[Thu Feb 23 10:35:09 2006] [info] [client X.X.X.X] (32)Broken pipe:
core_output_filter: writing data to the network
[Thu Feb 23 10:35:09 2006] [info] [client X.X.X.X] (32)Broken pipe:
core_output_filter: writing data to the network
[Thu Feb 23 10:35:09 2006] [error] [client X.X.X.X] Provider encountered an
error while streaming a REPORT response. [500, #0]
[Thu Feb 23 10:35:09 2006] [error] [client X.X.X.X] A failure occurred while
driving the update report editor [500, #190004]
[Thu Feb 23 10:35:09 2006] [info] [client X.X.X.X] (32)Broken pipe:
core_output_filter: writing data to the network
A truss of the pound process shows the following behavior immediately before
the client connection is shut down. Lwp 34 below is the process which
successfully passed data from the worker to the client for ~40s (2.6MB of
data transmitted prior to the timeout). Lwp 2 appears to be the worker
monitoring thread.
4060/34: 64.0766 pollsys(0xFEE0A3D4, 1, 0xFEE0A360, 0x00000000) = 1
4060/34: 64.0767 write(0, "\r\n 1 b e f e\r\n b C V".., 2057) =
2057
4060/34: 64.0770 pollsys(0xFEE0A374, 1, 0xFEE0A300, 0x00000000) = 1
4060/34: 64.0772 read(1, " v u p X\n U / Z / Q f /".., 4096) =
4096
4060/34: pollsys(0xFEE0A3D4, 1, 0xFEE0A360, 0x00000000) (sleeping...)
4060/2: 68.7982 nanosleep(0xFEF2FEF0, 0xFEF2FEE8) = 0
4060/2: 68.7983 time() =
1140721552
4060/2: 68.7985 time() =
1140721552
4060/2: nanosleep(0xFEF2FEF0, 0xFEF2FEE8) (sleeping...)
4060/34: 74.0782 pollsys(0xFEE0A3D4, 1, 0xFEE0A360, 0x00000000) = 0
4060/34: 74.0784 fstat(3, 0xFEE0A250) = 0
4060/34: 74.0786 time() =
1140721557
4060/34: 74.0789 putmsg(3, 0xFEE09908, 0xFEE098FC, 0) = 0
4060/34: 74.0790 open("/var/run/syslog_door", O_RDONLY) = 2
4060/34: 74.0792 door_info(2, 0xFEE09840) = 0
4060/34: 74.0794 getpid() =
4060 [4059]
4060/34: 74.0796 door_call(2, 0xFEE09828) = 0
4060/34: 74.0797 close(2) = 0
4060/34: 74.0801 shutdown(1, SHUT_RDWR, SOV_DEFAULT) = 0
4060/34: 74.0806 close(1) = 0
4060/34: 74.0808 shutdown(0, SHUT_RDWR, SOV_DEFAULT) = 0
4060/34: 75.3392 close(0) = 0
4060/34: 75.3394 lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) =
0xFFBFFEFF [0x0000FFFF]
4060/34: 75.3397 lwp_exit()
Note that checkouts done directly from this worker complete without errors,
but when redirected through pound, the connection times out.
Initially, I had no Session tracking set up at all for subversion clients.
This meant that discrete requests which may have been part of the same
client operation (e.g. a checkout) would be balanced across the workers.
For debugging, I changed to a Session tracker with Type=IP and TTL=900, to
ensure that Pound is only dealing with a single worker for the duration of a
checkout. This apparently was not related.
My pound configuration:
User "nobody"
Group "nobody"
LogLevel 2
ListenHTTP
Address X.X.X.X
Port 90
xHTTP 1
WebDAV 1
Service
Session
Type IP
TTL 900
End
BackEnd
Address A.A.A.A
Port 80
End
BackEnd
Address B.B.B.B
Port 80
End
BackEnd
Address C.C.C.C
Port 80
End
End
End
Can someone offer some insight?
Thanks in advance,
Cory.
|