/ Zope / Apsis / Pound Mailing List / Archive / 2004 / 2004-09 / Still Problem with the current

[ << ] [ >> ]

[ slow connection / "Hegedus, Ervin" ... ] [ Still Problem with the current 2 / ... ]

Still Problem with the current
"Alexander Meis" <am(at)simoon.de>
2004-09-15 15:50:30 [ SNIP ]
Hi,

i got still problems with the pound-current.
with the actual version the backend on which lighttpd is running gets lost
the Problem starts with the log Message:
Sep 15 15:00:52 balancer1 pound: getsockopt failed: Bad file descriptor
Sep 15 15:00:52 balancer1 pound: backend 213.131.250.44:80 connect: Bad file 
descriptor
Sep 15 15:00:52 balancer1 pound: no backend "GET /img/ck-logo.gif HTTP/1.1" 
from 217.247.2.178
Sep 15 15:00:52 balancer1 pound: no backend "GET /img/smileys/biggrin.gif 
HTTP/1.0" from 193.28.212.20
....
than:
Sep 15 15:00:59 balancer1 pound: no backend "GET 
/img/button-forum-neuesthema.gif HTTP/1.0" from 193.170.250.122
Sep 15 15:00:59 balancer1 pound: BackEnd 213.131.250.44 resurrect
Sep 15 15:00:59 balancer1 pound: BackEnd 213.131.250.44 resurrect

than again:
Sep 15 15:03:30 balancer1 pound: backend 213.131.250.44:80 connect: Bad file 
descriptor
Sep 15 15:03:30 balancer1 pound: no backend "GET /img/leer.gif HTTP/1.1" 
from 172.177.234.62
.....

can someone tell me where the problem is.

This behavior starts only with lighttpd, it seems that that does not happen 
with apache.
The pound 1.7 runns well.

Here you can find my current config.
http://213.131.250.41:81/poundcfg.txt

Thanks.

Mit besten Grüssen aus Sinzig am Rhein,

Alexander Meis

Chefkoch.de - Deutschlands größte Koch-Community
http://www.chefkoch.de

--------------------------------------------------------------
pixelhouse GmbH
Kirchplatz 8
53489 Sinzig
Telefon: 02642-980330
Telefax: 02642-980215
mailto:am(at)pixelhouse.de
http://www.pixelhouse.de
--------------------------------------------------------------







RE: Still Problem with the current
"John Hansen" <john(at)oztralis.com.au>
2004-09-15 17:19:59 [ SNIP ]
What I find interresting is that this might be related to the e500's I'm
getting.

I'm also starting to get (very rare tho) partial content from a vhost
_different_ to the one I'm connecting to, similar to that
described in a previous post eg:

Urlgroup .*
Headrequire host somehost
...

and

Urlgroup .*
Headrequire host someotherhost
...

When accessing somehost, I get a page from someotherhost.

None of these things happen if I uninstall pound, and route directly to one of
the backends.

Thread-safety issues?
Maybe http is transient by nature, and as such should incorporate retrylimits?

... John

> -----Original Message-----
> From: Alexander Meis [mailto:am(at)simoon.de] 
> Sent: Wednesday, September 15, 2004 11:51 PM
> To: pound(at)apsis.ch
> Subject: Still Problem with the current
> 
> Hi,
> 
> i got still problems with the pound-current.
> with the actual version the backend on which lighttpd is 
> running gets lost the Problem starts with the log Message:
> Sep 15 15:00:52 balancer1 pound: getsockopt failed: Bad file 
> descriptor Sep 15 15:00:52 balancer1 pound: backend 
> 213.131.250.44:80 connect: Bad file descriptor Sep 15 
> 15:00:52 balancer1 pound: no backend "GET /img/ck-logo.gif HTTP/1.1" 
> from 217.247.2.178
> Sep 15 15:00:52 balancer1 pound: no backend "GET 
> /img/smileys/biggrin.gif HTTP/1.0" from 193.28.212.20 ....
> than:
> Sep 15 15:00:59 balancer1 pound: no backend "GET 
> /img/button-forum-neuesthema.gif HTTP/1.0" from 
> 193.170.250.122 Sep 15 15:00:59 balancer1 pound: BackEnd 
> 213.131.250.44 resurrect Sep 15 15:00:59 balancer1 pound: 
> BackEnd 213.131.250.44 resurrect
> 
> than again:
> Sep 15 15:03:30 balancer1 pound: backend 213.131.250.44:80 
> connect: Bad file descriptor Sep 15 15:03:30 balancer1 pound: 
> no backend "GET /img/leer.gif HTTP/1.1" 
> from 172.177.234.62
> .....
> 
> can someone tell me where the problem is.
> 
> This behavior starts only with lighttpd, it seems that that 
> does not happen with apache.
> The pound 1.7 runns well.
> 
> Here you can find my current config.
> http://213.131.250.41:81/poundcfg.txt
> 
> Thanks.
> 
> Mit besten Grüssen aus Sinzig am Rhein,
> 
> Alexander Meis
> 
> Chefkoch.de - Deutschlands größte Koch-Community 
> http://www.chefkoch.de
> 
> --------------------------------------------------------------
> pixelhouse GmbH
> Kirchplatz 8
> 53489 Sinzig
> Telefon: 02642-980330
> Telefax: 02642-980215
> mailto:am(at)pixelhouse.de
> http://www.pixelhouse.de
> --------------------------------------------------------------


Re: Still Problem with the current
Robert Segall <roseg(at)apsis.ch>
2004-09-15 18:15:00 [ SNIP ]
On Wednesday 15 September 2004 15.50, Alexander Meis wrote:
> Hi,
>
> i got still problems with the pound-current.
> with the actual version the backend on which lighttpd is running gets lost
> the Problem starts with the log Message:
> Sep 15 15:00:52 balancer1 pound: getsockopt failed: Bad file descriptor
> Sep 15 15:00:52 balancer1 pound: backend 213.131.250.44:80 connect: Bad
> file descriptor
> Sep 15 15:00:52 balancer1 pound: no backend "GET /img/ck-logo.gif HTTP/1.1"
> from 217.247.2.178
> Sep 15 15:00:52 balancer1 pound: no backend "GET /img/smileys/biggrin.gif
> HTTP/1.0" from 193.28.212.20
> ....
> than:
> Sep 15 15:00:59 balancer1 pound: no backend "GET
> /img/button-forum-neuesthema.gif HTTP/1.0" from 193.170.250.122
> Sep 15 15:00:59 balancer1 pound: BackEnd 213.131.250.44 resurrect
> Sep 15 15:00:59 balancer1 pound: BackEnd 213.131.250.44 resurrect
>
> than again:
> Sep 15 15:03:30 balancer1 pound: backend 213.131.250.44:80 connect: Bad
> file descriptor
> Sep 15 15:03:30 balancer1 pound: no backend "GET /img/leer.gif HTTP/1.1"
> from 172.177.234.62
> .....
>
> can someone tell me where the problem is.
>
> This behavior starts only with lighttpd, it seems that that does not happen
> with apache.
> The pound 1.7 runns well.
>
> Here you can find my current config.
> http://213.131.250.41:81/poundcfg.txt
>
> Thanks.
>
> Mit besten Grüssen aus Sinzig am Rhein,
>
> Alexander Meis

Have a look in svc.c, in connect_nb(). I suspect the back-end closed the 
connection - or otherwise refused it - while in poll(), thus the file 
descriptor is no longer valid. Most likely an overloaded back-end that either 
refuses (does lighttpd do connection throttling?) or is slow to accept a 
connection.

From Pound's point of view this is a "dead" back-end. You may want to try 
increasing the Server parameter.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: Still Problem with the current
Robert Segall <roseg(at)apsis.ch>
2004-09-15 18:18:42 [ SNIP ]
On Wednesday 15 September 2004 17.19, John Hansen wrote:
> What I find interresting is that this might be related to the e500's I'm
> getting.
>
> I'm also starting to get (very rare tho) partial content from a vhost
> _different_ to the one I'm connecting to, similar to that described in a
> previous post eg:
>
> Urlgroup .*
> Headrequire host somehost
> ...
>
> and
>
> Urlgroup .*
> Headrequire host someotherhost
> ...
>
> When accessing somehost, I get a page from someotherhost.

Very unlikely to be a Pound issue. Maybe redirects and/or base tags that send 
your browser to some other host?

> None of these things happen if I uninstall pound, and route directly to one
> of the backends.

Of course - you are accessing a single server, with a single Host header.

> Thread-safety issues?

VERY unlikely.

> Maybe http is transient by nature, and as such should incorporate
> retrylimits?

Sorry, can't afford that on a load-balancer. "Transient" is not really 
relevant here: HTTP defines the request/response cycle quite clearly.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: Still Problem with the current
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-09-15 20:10:27 [ SNIP ]
Robert,

Begining with Pound 1.7 we have been seeing the same sort of
problem.  We get a bunch of e500s.  We also see occasional situations
where user A gets material clearly intended for user B.  We also are
seeing instances where web pages are only partically displayed.  This is 
a critical problem for us!

We are running Zope 2.6.4, RH 7.3, Pound-current (9/13), Python 2.3.4.



On Wed, 15 Sep 2004, Robert Segall wrote:

> On Wednesday 15 September 2004 17.19, John Hansen wrote:
> > What I find interresting is that this might be related to the e500's I'm
> > getting.
> >
> > I'm also starting to get (very rare tho) partial content from a vhost
> > _different_ to the one I'm connecting to, similar to that described in a
> > previous post eg:
> >
> > Urlgroup .*
> > Headrequire host somehost
> > ...
> >
> > and
> >
> > Urlgroup .*
> > Headrequire host someotherhost
> > ...
> >
> > When accessing somehost, I get a page from someotherhost.
> 
> Very unlikely to be a Pound issue. Maybe redirects and/or base tags that send

> your browser to some other host?
> 
> > None of these things happen if I uninstall pound, and route directly to one
> > of the backends.
> 
> Of course - you are accessing a single server, with a single Host header.
> 
> > Thread-safety issues?
> 
> VERY unlikely.
> 
> > Maybe http is transient by nature, and as such should incorporate
> > retrylimits?
> 
> Sorry, can't afford that on a load-balancer. "Transient" is not really 
> relevant here: HTTP defines the request/response cycle quite clearly.
> -- 
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904
> 


Re: Still Problem with the current
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-09-15 21:31:15 [ SNIP ]
A bit more information and maybe a hint--

When I look carefully at the logs, I see entries like:

Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - -
[15/Sep/2004:11:19:54 -0700] "GET /standard_javascripts.js HTTP/1.1" ...

The request originator was foo.foobar.com but the IP address, 10.11.12.13, 
bound to it belongs to another concurrent user.  Presumably
this log entry (from the deault syslog, LogLevel 3, is the result of the 
data being successfully sent to the IP address -- but the IP address is 
wrong.  How could the IP address get bolluxed?



On Wed, 15 Sep 2004, Dennis Allison wrote:

> Robert,
> 
> Begining with Pound 1.7 we have been seeing the same sort of
> problem.  We get a bunch of e500s.  We also see occasional situations
> where user A gets material clearly intended for user B.  We also are
> seeing instances where web pages are only partically displayed.  This is 
> a critical problem for us!
> 
> We are running Zope 2.6.4, RH 7.3, Pound-current (9/13), Python 2.3.4.
> 
> 


Re: Still Problem with the current
"Alexander Meis" <am(at)simoon.de>
2004-09-15 23:05:48 [ SNIP ]
> Begining with Pound 1.7 we have been seeing the same sort of
> problem.  We get a bunch of e500s.  We also see occasional situations
> where user A gets material clearly intended for user B.  We also are
> seeing instances where web pages are only partically displayed.  This is
> a critical problem for us!

we had that too, but i thought that it was mixing up the local webserver
session when restarting pound
but i switched back to the 1.7 but it sometimes displays an out of memory
error in the log.



Re: Still Problem with the current
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-09-15 23:16:36 [ SNIP ]
So, I now think I'll back off to an earlier version for the moment...
There have been enough version that I don't remember what's the most 
stable.  I now that we had problems with 1.7 and several of the candidates
following.  I suppose I'll move back to Version 1.6 or, maybe, Version
1.5.  Does anyone have suggestion as to what was the most stable version
before the latest spate of problems?

As to restart confusion, that may well be the source of the problem.  I'll
take a look at the logs.  I have seen occasional worker restarts but don't
now if they are correlated to the cross-over problem.



On Wed, 15 Sep 2004, Alexander Meis wrote:

> > Begining with Pound 1.7 we have been seeing the same sort of
> > problem.  We get a bunch of e500s.  We also see occasional situations
> > where user A gets material clearly intended for user B.  We also are
> > seeing instances where web pages are only partically displayed.  This is
> > a critical problem for us!
> 
> we had that too, but i thought that it was mixing up the local webserver
> session when restarting pound
> but i switched back to the 1.7 but it sometimes displays an out of memory
> error in the log.
> 
> 


RE: Still Problem with the current
"John Hansen" <john(at)oztralis.com.au>
2004-09-16 00:44:42 [ SNIP ]
> >
> > When accessing somehost, I get a page from someotherhost.
> 
> Very unlikely to be a Pound issue. Maybe redirects and/or 
> base tags that send your browser to some other host?

That's a negative,.. No redirects or base tags on those pages.



Re: Still Problem with the current
Sascha Ottolski <sascha.ottolski(at)gallileus.de>
2004-09-16 11:45:14 [ SNIP ]
Am Mittwoch, 15. September 2004 20:10 schrieb Dennis Allison:
> Robert,
>
> Begining with Pound 1.7 we have been seeing the same sort of
> problem.  We get a bunch of e500s.  We also see occasional situations
> where user A gets material clearly intended for user B.  We also are
> seeing instances where web pages are only partically displayed.  This
> is a critical problem for us!

just in case this might be related: I've just seen that one 
particular .js-file is often only transmitted partly, while other times 
its complete. In tcpwatch it looks something like this:

[00:01.028 - client closed]
[00:01.028 - server connection error exceptions.IOError: Closed without 
finishing response to client]

It happens at different places in the file.

BTW, whenever pound is about to deliver one of its error-files

Err500 "/usr/local/etc/pound_error.html"
Err501 "/usr/local/etc/pound_error.html"
Err503 "/usr/local/etc/pound_error.html"
Err414 "/usr/local/etc/pound_error.html"

the respective page is _always_ truncated somewhere in the middle. I've 
never seen it fully through pound. However, the page is valid html, 
I've just double checked it. I doubt it, but there still might be a 
little chance that this is also related?


Cheers,

Sascha



Re: Still Problem with the current
Robert Segall <roseg(at)apsis.ch>
2004-09-16 14:14:05 [ SNIP ]
On Wednesday 15 September 2004 21.31, Dennis Allison wrote:
> A bit more information and maybe a hint--
>
> When I look carefully at the logs, I see entries like:
>
> Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - -
> [15/Sep/2004:11:19:54 -0700] "GET /standard_javascripts.js HTTP/1.1" ...
>
> The request originator was foo.foobar.com but the IP address, 10.11.12.13,
> bound to it belongs to another concurrent user.  Presumably
> this log entry (from the deault syslog, LogLevel 3, is the result of the
> data being successfully sent to the IP address -- but the IP address is
> wrong.  How could the IP address get bolluxed?

It can't and it isn't. What you see in the log file is the request virtual 
host (foo.foobar.com - the contents of the Host header) and the address of 
the client (10.11.12.13). This is normal for the CLF.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: Still Problem with the current
Robert Segall <roseg(at)apsis.ch>
2004-09-16 14:18:29 [ SNIP ]
On Thursday 16 September 2004 11.45, Sascha Ottolski wrote:
> BTW, whenever pound is about to deliver one of its error-files
>
> Err500 "/usr/local/etc/pound_error.html"
> Err501 "/usr/local/etc/pound_error.html"
> Err503 "/usr/local/etc/pound_error.html"
> Err414 "/usr/local/etc/pound_error.html"
>
> the respective page is _always_ truncated somewhere in the middle. I've
> never seen it fully through pound. However, the page is valid html,
> I've just double checked it. I doubt it, but there still might be a
> little chance that this is also related?

Most likely the page you defined is too large. Have a look at http.c - the 
error page, inclusive of headers, should fit in MAXBUF bytes. Try making your 
page shorter and you'll see it in full.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: Still Problem with the current
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-09-16 17:19:00 [ SNIP ]
Robert, My message was far from clear.  Let me try again and explicate 
this particualar unstance.  

I have a client at IP address, 1.2.3.4.  It makes a request to
foo.foobar.com.  Log entries for earlier in the session have the 
form 

   Sep 15 11:19:54 epaul pound: foo.foobar.com 1.2.3.4 - - ...

and then, out of the blue, we get an entry 

   Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - - ... 

and the client display shows content which is inconsistent with the
request.

The IP address, 10.11.12.13, is the ip address associated with 
another activei, unrelated  session.

This behavior has been reported by several of our users as well as, in
this particular case, being right in our face.   When this occurred, we
were the only session on foo.foobar.com.

We are running RH 7.3, Zope 2.6.4, Pound-current (13 Sep), Python 2.3.4.
We run a number of independent instances of Zope all front-ended by Pound.  
We use DNS to mapping to alias several hundred domain names of the form

	XXX.ourdomain.com

to the server IP address, then use Pound as a reverse proxy to distribute 
to the appropriate Zope instance.  In the near future we'll want to have 
multiple Zope front-ends serving ZEO backends.

The Pound configuration file entries are mostly of the form 

UrlGroup ".*"
BackEnd 127.0.0.1,8081,5
HeadRequire Host ".*subdomain\.ourdomain.com.*"
EndGroup

Our system uses basic and cookie authentication, but the session mechanism
is not spelled out in the Pound configuration.  [Should it be?]

We tend to have multiple users, say 20 or so, logged into our system from
a single IP address.  In reality, they are on a local network which is
exposed to the Internet through the single IP.

We have not seen this problem before; it appeared when we upgraded to 1.7
and then when we upgraded to the various Pound-current releases.  

And, like several other users, we have been plagued with 500 errors of
late.  Users have also reported other problems which seem to be related--
display of pages as HTML rather than rendered content and display of
partial pages.   

The logs include numerous entries of the form

Sep 15 19:18:42 epaul pound: MONITOR: worker exited on signal 11,
restarting...

where `epaul' is one of our server names.

These appear at irregular intervals 10 or 12 per hour.  They are sometimes
clustered with two or three appearing in the space of a minute or two.
The frequency appears to be related to load, but they appear even under
light load.

And there are lots of `error' entries in the logs.  Not all such entries
are errors, they just memorialize events which happen in the normal course
of business.  We do see many instances of ` Bad file descriptor' and
`Input/output error'.  Are these to be expected or are they indicative of
a problem?

As always, thanks for your help.  Any suggestions you might have as to how
to isolate this problem would be helpful. 


On Thu, 16 Sep 2004, Robert Segall wrote:

> On Wednesday 15 September 2004 21.31, Dennis Allison wrote:
> > A bit more information and maybe a hint--
> >
> > When I look carefully at the logs, I see entries like:
> >
> > Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - -
> > [15/Sep/2004:11:19:54 -0700] "GET /standard_javascripts.js HTTP/1.1" ...
> >
> > The request originator was foo.foobar.com but the IP address, 10.11.12.13,
> > bound to it belongs to another concurrent user.  Presumably
> > this log entry (from the deault syslog, LogLevel 3, is the result of the
> > data being successfully sent to the IP address -- but the IP address is
> > wrong.  How could the IP address get bolluxed?
> 
> It can't and it isn't. What you see in the log file is the request virtual 
> host (foo.foobar.com - the contents of the Host header) and the address of 
> the client (10.11.12.13). This is normal for the CLF.
> -- 
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904
> 


Re: Still Problem with the current
"Simon Matter" <simon.matter(at)ch.sauter-bc.com>
2004-09-16 17:57:35 [ SNIP ]
> Robert, My message was far from clear.  Let me try again and explicate
> this particualar unstance.
>
> I have a client at IP address, 1.2.3.4.  It makes a request to
> foo.foobar.com.  Log entries for earlier in the session have the
> form
>
>    Sep 15 11:19:54 epaul pound: foo.foobar.com 1.2.3.4 - - ...
>
> and then, out of the blue, we get an entry
>
>    Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - - ...
>
> and the client display shows content which is inconsistent with the
> request.
>
> The IP address, 10.11.12.13, is the ip address associated with
> another activei, unrelated  session.
>
> This behavior has been reported by several of our users as well as, in
> this particular case, being right in our face.   When this occurred, we
> were the only session on foo.foobar.com.
>
> We are running RH 7.3, Zope 2.6.4, Pound-current (13 Sep), Python 2.3.4.
> We run a number of independent instances of Zope all front-ended by Pound.
> We use DNS to mapping to alias several hundred domain names of the form
>
> 	XXX.ourdomain.com
>
> to the server IP address, then use Pound as a reverse proxy to distribute
> to the appropriate Zope instance.  In the near future we'll want to have
> multiple Zope front-ends serving ZEO backends.
>
> The Pound configuration file entries are mostly of the form
>
> UrlGroup ".*"
> BackEnd 127.0.0.1,8081,5
> HeadRequire Host ".*subdomain\.ourdomain.com.*"
> EndGroup
>
> Our system uses basic and cookie authentication, but the session mechanism
> is not spelled out in the Pound configuration.  [Should it be?]
>
> We tend to have multiple users, say 20 or so, logged into our system from
> a single IP address.  In reality, they are on a local network which is
> exposed to the Internet through the single IP.
>
> We have not seen this problem before; it appeared when we upgraded to 1.7
> and then when we upgraded to the various Pound-current releases.
>
> And, like several other users, we have been plagued with 500 errors of
> late.  Users have also reported other problems which seem to be related--
> display of pages as HTML rather than rendered content and display of
> partial pages.
>
> The logs include numerous entries of the form
>
> Sep 15 19:18:42 epaul pound: MONITOR: worker exited on signal 11,
> restarting...

I'm using pound on a very similar system (RH-7.3, updated to latest
errata) and haven't had any problems for quite some time now (however the
box has only light load, ~20-30 hits/sec). I get some errors in the logs
as well but never got the signal 11 error. Doesn't sound very nice. Is
this a SMP box?

I'm using my own rpm package which is available here
http://www.invoca.ch/pub/packages/pound/
If you want to give it a try I could send you the i386 rpm I'm using on
RH-7.3 so you could try if it also happens with it. If yes, we could try
to find out what's the difference between you and my server, otherwise
it's interesting to know what's the difference in your build.
Just let me know if I should send you the package.

Simon


Re: Still Problem with the current
Robert Segall <roseg(at)apsis.ch>
2004-09-16 18:12:33 [ SNIP ]
On Thursday 16 September 2004 17.19, Dennis Allison wrote:
> Robert, My message was far from clear.  Let me try again and explicate
> this particualar unstance.

I'll try to answer as best I can.

> I have a client at IP address, 1.2.3.4.  It makes a request to
> foo.foobar.com.  Log entries for earlier in the session have the
> form
>
>    Sep 15 11:19:54 epaul pound: foo.foobar.com 1.2.3.4 - - ...

So far so good.

> and then, out of the blue, we get an entry
>
>    Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - - ...
>
> and the client display shows content which is inconsistent with the
> request.

If you have several active clients it's not really surprising that you would 
see requests from them all. Why the response is incorrect is another 
question.

> The IP address, 10.11.12.13, is the ip address associated with
> another activei, unrelated  session.

Said client having sent a request it appears in the logs.

> This behavior has been reported by several of our users as well as, in
> this particular case, being right in our face.   When this occurred, we
> were the only session on foo.foobar.com.

Now this is interesting. Are you absolutely sure you were really alone?

> We are running RH 7.3, Zope 2.6.4, Pound-current (13 Sep), Python 2.3.4.
> We run a number of independent instances of Zope all front-ended by Pound.
> We use DNS to mapping to alias several hundred domain names of the form
>
> 	XXX.ourdomain.com
>
> to the server IP address, then use Pound as a reverse proxy to distribute
> to the appropriate Zope instance.  In the near future we'll want to have
> multiple Zope front-ends serving ZEO backends.

I suspect it's the other way around...

> The Pound configuration file entries are mostly of the form
>
> UrlGroup ".*"
> BackEnd 127.0.0.1,8081,5
> HeadRequire Host ".*subdomain\.ourdomain.com.*"
> EndGroup
>
> Our system uses basic and cookie authentication, but the session mechanism
> is not spelled out in the Pound configuration.  [Should it be?]

Not necessarily, only if you need it. Please remember that the concept of 
"session" does not exist in HTTP - it is something in your mind, not in the 
protocol. If you have a single back-end per virtual host then whatever 
mechanism you use is OK and Pound does not care. It becomes important if you 
have more than one back-end per UrlGroup, otherwise a client establishes a 
"session" with one back-end and the next request goes to another, which knows 
nothing about it.

> We tend to have multiple users, say 20 or so, logged into our system from
> a single IP address.  In reality, they are on a local network which is
> exposed to the Internet through the single IP.

NAT does not matter - the source socket is different by definition.

> We have not seen this problem before; it appeared when we upgraded to 1.7
> and then when we upgraded to the various Pound-current releases.

Unless you use HTTPS there is not much in the transition from 1.6 to 1.7 to 
affect you. 1.7 had one major issue in dealing with multiple listeners, but 
that has been fixed in -current.

> And, like several other users, we have been plagued with 500 errors of
> late.  Users have also reported other problems which seem to be related--
> display of pages as HTML rather than rendered content and display of
> partial pages.
>
> The logs include numerous entries of the form
>
> Sep 15 19:18:42 epaul pound: MONITOR: worker exited on signal 11,
> restarting...
>
> where `epaul' is one of our server names.

That is a major issue. I have no idea why should you get a SEGV, but it points 
towards a significant problem. In many cases this has proven to have little 
to do with Pound - more often than not this was resolved by updating system 
libraries/kernel.

This may also be related to partial responses - when the process aborts it may 
happen in the middle of writing a page back to the client.

> These appear at irregular intervals 10 or 12 per hour.  They are sometimes
> clustered with two or three appearing in the space of a minute or two.
> The frequency appears to be related to load, but they appear even under
> light load.

On many occasions this has been a result of compiling/linking with the wrong 
libraries (such as OpenSSL without threads support or a bad pthreads 
version).

> And there are lots of `error' entries in the logs.  Not all such entries
> are errors, they just memorialize events which happen in the normal course
> of business.  We do see many instances of ` Bad file descriptor' and
> `Input/output error'.  Are these to be expected or are they indicative of
> a problem?

Most of these are benign - very often connected to client or server time-outs. 
I suggest you try playing with the Server parameter.

> As always, thanks for your help.  Any suggestions you might have as to how
> to isolate this problem would be helpful.

Final suggestion: try installing Pound on a clean machine (fresh system 
install). I believe you use Linux - we have very satisfied users on both SuSE 
9.1 and Gentoo (latest, with the dev-sources a.k.a. kernel 2.6). We have 
repeatedly had reports of nasty behaviour on RedHat machines, so for testing 
it might be better to avoid that.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-1-920 4904

Re: Still Problem with the current
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-09-16 18:39:39 [ SNIP ]
Yes, this is a SMP box, dual athalon MP 2800, Tyan board, 4GB memory,
3ware raid.

Are you runing the latest Pound-current?  

On Thu, 16 Sep 2004, Simon Matter wrote:

> > Robert, My message was far from clear.  Let me try again and explicate
> > this particualar unstance.
> >
> > I have a client at IP address, 1.2.3.4.  It makes a request to
> > foo.foobar.com.  Log entries for earlier in the session have the
> > form
> >
> >    Sep 15 11:19:54 epaul pound: foo.foobar.com 1.2.3.4 - - ...
> >
> > and then, out of the blue, we get an entry
> >
> >    Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - - ...
> >
> > and the client display shows content which is inconsistent with the
> > request.
> >
> > The IP address, 10.11.12.13, is the ip address associated with
> > another activei, unrelated  session.
> >
> > This behavior has been reported by several of our users as well as, in
> > this particular case, being right in our face.   When this occurred, we
> > were the only session on foo.foobar.com.
> >
> > We are running RH 7.3, Zope 2.6.4, Pound-current (13 Sep), Python 2.3.4.
> > We run a number of independent instances of Zope all front-ended by Pound.
> > We use DNS to mapping to alias several hundred domain names of the form
> >
> > 	XXX.ourdomain.com
> >
> > to the server IP address, then use Pound as a reverse proxy to distribute
> > to the appropriate Zope instance.  In the near future we'll want to have
> > multiple Zope front-ends serving ZEO backends.
> >
> > The Pound configuration file entries are mostly of the form
> >
> > UrlGroup ".*"
> > BackEnd 127.0.0.1,8081,5
> > HeadRequire Host ".*subdomain\.ourdomain.com.*"
> > EndGroup
> >
> > Our system uses basic and cookie authentication, but the session mechanism
> > is not spelled out in the Pound configuration.  [Should it be?]
> >
> > We tend to have multiple users, say 20 or so, logged into our system from
> > a single IP address.  In reality, they are on a local network which is
> > exposed to the Internet through the single IP.
> >
> > We have not seen this problem before; it appeared when we upgraded to 1.7
> > and then when we upgraded to the various Pound-current releases.
> >
> > And, like several other users, we have been plagued with 500 errors of
> > late.  Users have also reported other problems which seem to be related--
> > display of pages as HTML rather than rendered content and display of
> > partial pages.
> >
> > The logs include numerous entries of the form
> >
> > Sep 15 19:18:42 epaul pound: MONITOR: worker exited on signal 11,
> > restarting...
> 
> I'm using pound on a very similar system (RH-7.3, updated to latest
> errata) and haven't had any problems for quite some time now (however the
> box has only light load, ~20-30 hits/sec). I get some errors in the logs
> as well but never got the signal 11 error. Doesn't sound very nice. Is
> this a SMP box?
> 
> I'm using my own rpm package which is available here
> http://www.invoca.ch/pub/packages/pound/
> If you want to give it a try I could send you the i386 rpm I'm using on
> RH-7.3 so you could try if it also happens with it. If yes, we could try
> to find out what's the difference between you and my server, otherwise
> it's interesting to know what's the difference in your build.
> Just let me know if I should send you the package.
> 
> Simon
> 


Re: Still Problem with the current
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-09-16 18:41:56 [ SNIP ]
I build from source against OpenSSL openssl-devel-0.9.6b-35.7.  

On Thu, 16 Sep 2004, Simon Matter wrote:

> > Robert, My message was far from clear.  Let me try again and explicate
> > this particualar unstance.
> >
> > I have a client at IP address, 1.2.3.4.  It makes a request to
> > foo.foobar.com.  Log entries for earlier in the session have the
> > form
> >
> >    Sep 15 11:19:54 epaul pound: foo.foobar.com 1.2.3.4 - - ...
> >
> > and then, out of the blue, we get an entry
> >
> >    Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - - ...
> >
> > and the client display shows content which is inconsistent with the
> > request.
> >
> > The IP address, 10.11.12.13, is the ip address associated with
> > another activei, unrelated  session.
> >
> > This behavior has been reported by several of our users as well as, in
> > this particular case, being right in our face.   When this occurred, we
> > were the only session on foo.foobar.com.
> >
> > We are running RH 7.3, Zope 2.6.4, Pound-current (13 Sep), Python 2.3.4.
> > We run a number of independent instances of Zope all front-ended by Pound.
> > We use DNS to mapping to alias several hundred domain names of the form
> >
> > 	XXX.ourdomain.com
> >
> > to the server IP address, then use Pound as a reverse proxy to distribute
> > to the appropriate Zope instance.  In the near future we'll want to have
> > multiple Zope front-ends serving ZEO backends.
> >
> > The Pound configuration file entries are mostly of the form
> >
> > UrlGroup ".*"
> > BackEnd 127.0.0.1,8081,5
> > HeadRequire Host ".*subdomain\.ourdomain.com.*"
> > EndGroup
> >
> > Our system uses basic and cookie authentication, but the session mechanism
> > is not spelled out in the Pound configuration.  [Should it be?]
> >
> > We tend to have multiple users, say 20 or so, logged into our system from
> > a single IP address.  In reality, they are on a local network which is
> > exposed to the Internet through the single IP.
> >
> > We have not seen this problem before; it appeared when we upgraded to 1.7
> > and then when we upgraded to the various Pound-current releases.
> >
> > And, like several other users, we have been plagued with 500 errors of
> > late.  Users have also reported other problems which seem to be related--
> > display of pages as HTML rather than rendered content and display of
> > partial pages.
> >
> > The logs include numerous entries of the form
> >
> > Sep 15 19:18:42 epaul pound: MONITOR: worker exited on signal 11,
> > restarting...
> 
> I'm using pound on a very similar system (RH-7.3, updated to latest
> errata) and haven't had any problems for quite some time now (however the
> box has only light load, ~20-30 hits/sec). I get some errors in the logs
> as well but never got the signal 11 error. Doesn't sound very nice. Is
> this a SMP box?
> 
> I'm using my own rpm package which is available here
> http://www.invoca.ch/pub/packages/pound/
> If you want to give it a try I could send you the i386 rpm I'm using on
> RH-7.3 so you could try if it also happens with it. If yes, we could try
> to find out what's the difference between you and my server, otherwise
> it's interesting to know what's the difference in your build.
> Just let me know if I should send you the package.
> 
> Simon
> 


Re: Still Problem with the current
Dennis Allison <allison(at)sumeru.stanford.EDU>
2004-09-16 19:05:03 [ SNIP ]

On Thu, 16 Sep 2004, Robert Segall wrote:

> On Thursday 16 September 2004 17.19, Dennis Allison wrote:
> > Robert, My message was far from clear.  Let me try again and explicate
> > this particualar unstance.
> 
> I'll try to answer as best I can.
> 
> > I have a client at IP address, 1.2.3.4.  It makes a request to
> > foo.foobar.com.  Log entries for earlier in the session have the
> > form
> >
> >    Sep 15 11:19:54 epaul pound: foo.foobar.com 1.2.3.4 - - ...
> 
> So far so good.
> 
> > and then, out of the blue, we get an entry
> >
> >    Sep 15 11:19:54 epaul pound: foo.foobar.com 10.11.12.13 - - ...
> >
> > and the client display shows content which is inconsistent with the
> > request.
> 
> If you have several active clients it's not really surprising that you would 
> see requests from them all. Why the response is incorrect is another 
> question.
> 
> > The IP address, 10.11.12.13, is the ip address associated with
> > another activei, unrelated  session.
> 
> Said client having sent a request it appears in the logs.
	But the target IP address is WRONG.

> 
> > This behavior has been reported by several of our users as well as, in
> > this particular case, being right in our face.   When this occurred, we
> > were the only session on foo.foobar.com.
> 
> Now this is interesting. Are you absolutely sure you were really alone?
	Absolutely.  There are only two possible users (password control)
	and both of us were in the room.   The failing system is a
	completely independent instance of Zope.  

> > We are running RH 7.3, Zope 2.6.4, Pound-current (13 Sep), Python 2.3.4.
> > We run a number of independent instances of Zope all front-ended by Pound.
> > We use DNS to mapping to alias several hundred domain names of the form
> >
> > 	XXX.ourdomain.com
> >
> > to the server IP address, then use Pound as a reverse proxy to distribute
> > to the appropriate Zope instance.  In the near future we'll want to have
> > multiple Zope front-ends serving ZEO backends.
> 
> I suspect it's the other way around...
	Perhaps, but this is future systems organization and not the
	immediate problem.

> 
> > The Pound configuration file entries are mostly of the form
> >
> > UrlGroup ".*"
> > BackEnd 127.0.0.1,8081,5
> > HeadRequire Host ".*subdomain\.ourdomain.com.*"
> > EndGroup
> >
> > Our system uses basic and cookie authentication, but the session mechanism
> > is not spelled out in the Pound configuration.  [Should it be?]
> 
> Not necessarily, only if you need it. Please remember that the concept of 
> "session" does not exist in HTTP - it is something in your mind, not in the 
> protocol. If you have a single back-end per virtual host then whatever 
> mechanism you use is OK and Pound does not care. It becomes important if you 
> have more than one back-end per UrlGroup, otherwise a client establishes a 
> "session" with one back-end and the next request goes to another, which knows

> nothing about it.
	That was my understanding.  But it's aways worthwhile to get
	confirmation. 


> > We tend to have multiple users, say 20 or so, logged into our system from
> > a single IP address.  In reality, they are on a local network which is
> > exposed to the Internet through the single IP.
> 
> NAT does not matter - the source socket is different by definition.
	Agreed.

> 
> > We have not seen this problem before; it appeared when we upgraded to 1.7
> > and then when we upgraded to the various Pound-current releases.
> 
> Unless you use HTTPS there is not much in the transition from 1.6 to 1.7 to 
> affect you. 1.7 had one major issue in dealing with multiple listeners, but 
> that has been fixed in -current.
	We had major problems with 1.7 due to delays transiting Pound.
	This was fixed by moving to Pound-current.  We do not use SSL at
	this point although it's a check-off item for us.   


> > And, like several other users, we have been plagued with 500 errors of
> > late.  Users have also reported other problems which seem to be related--
> > display of pages as HTML rather than rendered content and display of
> > partial pages.
> >
> > The logs include numerous entries of the form
> >
> > Sep 15 19:18:42 epaul pound: MONITOR: worker exited on signal 11,
> > restarting...
> >
> > where `epaul' is one of our server names.
> 
> That is a major issue. I have no idea why should you get a SEGV, but it
points 
> towards a significant problem. In many cases this has proven to have little 
> to do with Pound - more often than not this was resolved by updating system 
> libraries/kernel.
> 
> This may also be related to partial responses - when the process aborts it
may 
> happen in the middle of writing a page back to the client.
	Hmmm... I wonder what could be causing early aborts.  We have had 
	groups of 20 users on the same network referencing, over few
	minute period the same page.  Same hardware and network setup for
	everyone.  15 get the page just fine, the other 5 get a partial
	page.  Repeating the process gets roughly the same result, but 
	the failures are on different machines.   There are lots of
	variables here which are not controlled so it's not much of a help 
	for detailed debugging, but this is the user experience which 
	is creating our complaints. 

> 
> > These appear at irregular intervals 10 or 12 per hour.  They are sometimes
> > clustered with two or three appearing in the space of a minute or two.
> > The frequency appears to be related to load, but they appear even under
> > light load.
> 
> On many occasions this has been a result of compiling/linking with the wrong 
> libraries (such as OpenSSL without threads support or a bad pthreads 
> version).
> 
> > And there are lots of `error' entries in the logs.  Not all such entries
> > are errors, they just memorialize events which happen in the normal course
> > of business.  We do see many instances of ` Bad file descriptor' and
> > `Input/output error'.  Are these to be expected or are they indicative of
> > a problem?
> 
> Most of these are benign - very often connected to client or server
time-outs. 
> I suggest you try playing with the Server parameter.
	I have and it makes little difference in this sort of problem
	except to increase the failure rate of IE users.  


> > As always, thanks for your help.  Any suggestions you might have as to how
> > to isolate this problem would be helpful.
> 
> Final suggestion: try installing Pound on a clean machine (fresh system 
> install). I believe you use Linux - we have very satisfied users on both SuSE

> 9.1 and Gentoo (latest, with the dev-sources a.k.a. kernel 2.6). We have 
> repeatedly had reports of nasty behaviour on RedHat machines, so for testing 
> it might be better to avoid that.
	We use RH for historical reasons.   I will change distros at some 
	point soon, but for the moment we are stuck with RH7.3.  For us,
	RH7.3 has been stable and reliable; RH9 was a big problem.


> -- 
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-1-920 4904
> 


MailBoxer