/ Zope / Apsis / Pound Mailing List / Archive / 2004 / 2004-01 / out of files (sockets) problem

[ << ] [ >> ]

[ Say Pound to reread the configfile / Alexander ... ] [ Wishlist: Rewrite/Set of Host-Header per UrlGroup ... ]

out of files (sockets) problem
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-01-13 18:29:23 [ SNIP ]
I'm using pound current as a front-end to a multi-Zope system

             --ZOPE
           /
-->--POUND-o--ZOPE
            \
              --ZOPE


I just had the sytem lock up and when I went to discover why found that 
there were too many files open.  lsof showed that pound had nearly
275000 open sockets, roughly 20000 of them identified as :tproxy.

Looks to me as if I have something misconfigured and incoming connections 
are never being retired.  

My configuration file has a header:
---------------------------------------------------------------------
# HTTP host,ports to listen for (0.0.0.0 is replaced by actual IP
ListenHTTP 0.0.0.0,80

# HTTPS hosts,ports to listen for

# HTTPSHeaders (certificate management)

# Pound user/group -- nobody for now, would own id be better?

User nobody
Group nobody

# Rootjail -- Directory Pound will chroot to at runtime

# Extended HTTP
ExtendedHTTP 1

# Log Level
LogLevel 3

# Alive value -- how long to keep checking non-responsive back-end hosts
# default is 30 seconds
#Alive 30

# Server value -- how long to wait for a server to resond
#Server 0

# Client value -- how long to wait for a client request
# default is 10 seconds
#Client 10

# SSL Engine name  (hardware accelleration)

# Err500 "filename" -- file to display if Error 500 occurs
# "An internal server error occurred.  Please try again later"
# Err501 "filename" -- file to display if Error 501 occurs
# "This method may not be used"
# Err503 "filename" -- file to display if Error 503 occurs

-----------------------------------------------------------------------
Pretty much out of the box and what I've used before.

I then have about 375 UrlGroup declarations--

UrlGroup ".*"
BackEnd 127.0.0.1,9999,5
HeadRequire Host ".*somename\.somedomain.com.*"
EndGroup

redirecting packets to (up to) a half dozen different Zopes running 
at different port values.



Any hints as to what the configuration problem might be


RE: out of files (sockets) problem
"Mark Fontana" <mark.fontana(at)efi.com>
2004-01-13 19:00:46 [ SNIP ]
Hi Dennis,

My company saw the exact same problem in cases where the backend servers
are responding slowly and the clients are configured to drop their
connections and retry the request in a timeframe sooner than pound's
backend server timeout.  Perhaps something similar is going on in
your case?

The problem is that while waiting for the backend server to respond,
pound does not monitor the frontend connection to see if it has dropped.
Consequently, socket resources are consumed for the both the backend
connection and dead frontend connection, and only when the backend
timeout is reached will these connections be cleaned up.  If there are
numerous clients retrying, it does not take long for a major pileup
to occur.

Last year, I submitted a patch against pound 1.4 that makes pound monitor
the frontend connections for possible drops while waiting for the backend
responses.  We've tested the patch for weeks with excellent results.
Unfortunately, the patch was rejected.

If you'd like to see the patch anyway, I can send it by private email.

Mark Fontana
Electronics For Imaging


-----Original Message-----
From: Dennis Allison [mailto:allison(at)sumeru.stanford.EDU]
Sent: Tuesday, January 13, 2004 11:29 AM
To: Robert Segall
Cc: pound(at)apsis.ch
Subject: out of files (sockets) problem



I'm using pound current as a front-end to a multi-Zope system

             --ZOPE
           /
-->--POUND-o--ZOPE
            \
              --ZOPE


I just had the sytem lock up and when I went to discover why found that 
there were too many files open.  lsof showed that pound had nearly
275000 open sockets, roughly 20000 of them identified as :tproxy.

Looks to me as if I have something misconfigured and incoming connections 
are never being retired.  

Re: out of files (sockets) problem
Robert Segall <roseg(at)apsis.ch>
2004-01-13 19:46:01 [ SNIP ]
On Tue, 2004-01-13 at 18:29, Dennis Allison wrote:
> I'm using pound current as a front-end to a multi-Zope system
> 
>              --ZOPE
>            /
> -->--POUND-o--ZOPE
>             \
>               --ZOPE
> 
> 
> I just had the sytem lock up and when I went to discover why found that 
> there were too many files open.  lsof showed that pound had nearly
> 275000 open sockets, roughly 20000 of them identified as :tproxy.
> 
> Looks to me as if I have something misconfigured and incoming connections 
> are never being retired.  

First the simple one: tproxy happens to be the service name for port
8081. If you use it that's what it will show as - nothing magical here.

As to the central question: it is very likely that you have some
long-lived requests and/or slow-to-respond back-ends. In both cases the
incoming requests accumulate while the previous connections are still
active, thus the number of active ports.

You don't tell us what state the ports are in (check with netstat), but
it may also happen that you have an unusually long timeout on close. If
you see a lot of sockets with something like FIN_WAIT2 as their state
that would be likely - some Windows versions are notoriously nasty (on
purpose) about it. You may want to tune your system for it.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904


RE: out of files (sockets) problem
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-01-13 20:20:37 [ SNIP ]
Mark -- send me the patch and I'll give it a try.  Sounds like this might 
be the problem ...
	-dra


On Tue, 13 Jan 2004, Mark Fontana wrote:

> 
> Hi Dennis,
> 
> My company saw the exact same problem in cases where the backend servers
> are responding slowly and the clients are configured to drop their
> connections and retry the request in a timeframe sooner than pound's
> backend server timeout.  Perhaps something similar is going on in
> your case?
> 
> The problem is that while waiting for the backend server to respond,
> pound does not monitor the frontend connection to see if it has dropped.
> Consequently, socket resources are consumed for the both the backend
> connection and dead frontend connection, and only when the backend
> timeout is reached will these connections be cleaned up.  If there are
> numerous clients retrying, it does not take long for a major pileup
> to occur.
> 
> Last year, I submitted a patch against pound 1.4 that makes pound monitor
> the frontend connections for possible drops while waiting for the backend
> responses.  We've tested the patch for weeks with excellent results.
> Unfortunately, the patch was rejected.
> 
> If you'd like to see the patch anyway, I can send it by private email.
> 
> Mark Fontana
> Electronics For Imaging
> 
> 
> -----Original Message-----
> From: Dennis Allison [mailto:allison(at)sumeru.stanford.EDU]
> Sent: Tuesday, January 13, 2004 11:29 AM
> To: Robert Segall
> Cc: pound(at)apsis.ch
> Subject: out of files (sockets) problem
> 
> 
> 
> I'm using pound current as a front-end to a multi-Zope system
> 
>              --ZOPE
>            /
> -->--POUND-o--ZOPE
>             \
>               --ZOPE
> 
> 
> I just had the sytem lock up and when I went to discover why found that 
> there were too many files open.  lsof showed that pound had nearly
> 275000 open sockets, roughly 20000 of them identified as :tproxy.
> 
> Looks to me as if I have something misconfigured and incoming connections 
> are never being retired.  
> 


Re: out of files (sockets) problem
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-01-13 20:34:37 [ SNIP ]
My primary backends are on ports 8081, 8082, 8083, ...

Sorry I forgot to indicate the state of the sockets.  They are mostly 
ESTABLISHED.  My guess is that I have a problem with slow to respond 
backends--it seems unlikely that I'd have more that a few hundred to 
a thousand long-lived connections.  And when I clear things out and 
begin afresh, the open sockets pile up quickly.

I'll change the backend response timeout and see if that helps.

On Tue, 13 Jan 2004, Robert Segall wrote:

> On Tue, 2004-01-13 at 18:29, Dennis Allison wrote:
> > I'm using pound current as a front-end to a multi-Zope system
> > 
> >              --ZOPE
> >            /
> > -->--POUND-o--ZOPE
> >             \
> >               --ZOPE
> > 
> > 
> > I just had the sytem lock up and when I went to discover why found that 
> > there were too many files open.  lsof showed that pound had nearly
> > 275000 open sockets, roughly 20000 of them identified as :tproxy.
> > 
> > Looks to me as if I have something misconfigured and incoming connections 
> > are never being retired.  
> 
> First the simple one: tproxy happens to be the service name for port
> 8081. If you use it that's what it will show as - nothing magical here.
> 
> As to the central question: it is very likely that you have some
> long-lived requests and/or slow-to-respond back-ends. In both cases the
> incoming requests accumulate while the previous connections are still
> active, thus the number of active ports.
> 
> You don't tell us what state the ports are in (check with netstat), but
> it may also happen that you have an unusually long timeout on close. If
> you see a lot of sockets with something like FIN_WAIT2 as their state
> that would be likely - some Windows versions are notoriously nasty (on
> purpose) about it. You may want to tune your system for it.
> -- 
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904
> 


Re: out of files (sockets) problem
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-01-13 23:00:46 [ SNIP ]
Sorry to respond to my own posting, but I have a bit of addtional
information---

First, changing the Server setting to 

Server 1

seems to have made the problem mostly disappear--which suggests that the 
problem is due to a slow server.   (I'll put in Mark Fontana's patch 
later today and see what effect it has...)

I examined the logs and found, hidden in the cruft, a number of error 
diagnostics like 

Jan 13 13:25:13 myserver pound: copy_bin error writing: Connection reset
by peer
Jan 13 13:25:13 myserver pound: error copy server cont: Connection reset
by peer


and occasional messages like 

Jan 13 13:35:38 myserver pound: error flush to 12.34.56.78: Connection
reset by peer

which seem to me to be normal

	-d

On Tue, 13 Jan 2004, Dennis Allison wrote:

> 
> My primary backends are on ports 8081, 8082, 8083, ...
> 
> Sorry I forgot to indicate the state of the sockets.  They are mostly 
> ESTABLISHED.  My guess is that I have a problem with slow to respond 
> backends--it seems unlikely that I'd have more that a few hundred to 
> a thousand long-lived connections.  And when I clear things out and 
> begin afresh, the open sockets pile up quickly.
> 
> I'll change the backend response timeout and see if that helps.
> 
> On Tue, 13 Jan 2004, Robert Segall wrote:
> 
> > On Tue, 2004-01-13 at 18:29, Dennis Allison wrote:
> > > I'm using pound current as a front-end to a multi-Zope system
> > > 
> > >              --ZOPE
> > >            /
> > > -->--POUND-o--ZOPE
> > >             \
> > >               --ZOPE
> > > 
> > > 
> > > I just had the sytem lock up and when I went to discover why found that 
> > > there were too many files open.  lsof showed that pound had nearly
> > > 275000 open sockets, roughly 20000 of them identified as :tproxy.
> > > 
> > > Looks to me as if I have something misconfigured and incoming connections

> > > are never being retired.  
> > 
> > First the simple one: tproxy happens to be the service name for port
> > 8081. If you use it that's what it will show as - nothing magical here.
> > 
> > As to the central question: it is very likely that you have some
> > long-lived requests and/or slow-to-respond back-ends. In both cases the
> > incoming requests accumulate while the previous connections are still
> > active, thus the number of active ports.
> > 
> > You don't tell us what state the ports are in (check with netstat), but
> > it may also happen that you have an unusually long timeout on close. If
> > you see a lot of sockets with something like FIN_WAIT2 as their state
> > that would be likely - some Windows versions are notoriously nasty (on
> > purpose) about it. You may want to tune your system for it.
> > -- 
> > Robert Segall
> > Apsis GmbH
> > Postfach, Uetikon am See, CH-8707
> > Tel: +41-1-920 4904
> > 
> 


Re: out of files (sockets) problem
Robert Segall <roseg(at)apsis.ch>
2004-01-14 12:57:02 [ SNIP ]
On Tue, 2004-01-13 at 23:00, Dennis Allison wrote:
> Sorry to respond to my own posting, but I have a bit of addtional
> information---
> 
> First, changing the Server setting to 
> 
> Server 1
> 
> seems to have made the problem mostly disappear--which suggests that the 
> problem is due to a slow server.   (I'll put in Mark Fontana's patch 
> later today and see what effect it has...)

By setting Server 1 you are effectively killing the back-end connection
after one second of inactivity. If that solves the problem you are right
- it tends to prove the theory of slow responses...

You may want to compare the Pound log and the back-end logs to see how
long a request actually takes.

> I examined the logs and found, hidden in the cruft, a number of error 
> diagnostics like 
> 
> Jan 13 13:25:13 myserver pound: copy_bin error writing: Connection reset
> by peer
> Jan 13 13:25:13 myserver pound: error copy server cont: Connection reset
> by peer
> 
> 
> and occasional messages like 
> 
> Jan 13 13:35:38 myserver pound: error flush to 12.34.56.78: Connection
> reset by peer
> 
> which seem to me to be normal

They are: some client gave up on waiting for a response and closed the
connection.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904


MailBoxer