/ Zope / Apsis / Pound Mailing List / Archive / 2006 / 2006-03 / monitor thread crashing and "pound confused"

[ << ] [ >> ]

[ Jeffrey Brown is out of the office. / Jeffrey ... ] [ pound error messages / Christian Sell ... ]

monitor thread crashing and "pound confused"
"Sergio Freire" <sergio-s-freire(at)ptinovacao.pt>
2006-03-13 12:35:20 [ FULL ]
Hi Robert and all.

In the other day in this list I reported a problem with the monitor
thread in Pound 2.0. Then I installed version 2.0.2 and the problem
seemed to vanish.  Today I had to restart all backend and after that two
important problems appeared:

            - monitor thread dying (receiving signal 11 , see bellow)

            - pound got completely confused, I explain: although all
backends where perfectly up and running  and their corresponding HAport
moniting services where also ok, pound considered them dead. Accessing
directly the backends with a web browser worked perfectly.  Restarting
Pound solved the problem immediately. It's a SERIOUS problem I hope we
can detect and correct and I want to contribute for that. You can see
that there is a gap in the logs... where running and accepting
connections but it wasn't logging neither balancing the requests..

 

Details of pound configuration and error messages are bellow.

 

Any ideas on how I can debug the first situation and what happened in
the second one?

 

Regards,

Sergio Freire

 

            

 

 

 

 

Part of /var/log/messages

 

 

Mar 13 10:45:14 dino-lb1 pound: backend 10.112.70.22:83 connect:
Connection refused

Mar 13 10:45:14 dino-lb1 pound: no back-end "GET /?rc=10000&source=myTMN
HTTP/1.1" from 172.30.2.3

Mar 13 10:45:31 dino-lb1 pound: MONITOR: worker exited on signal 11,
restarting...

Mar 13 10:45:31 dino-lb1 pound: BackEnd 10.112.70.21:12060 is dead (HA)

Mar 13 10:45:31 dino-lb1 last message repeated 3 times

Mar 13 10:45:33 dino-lb1 pound: backend 10.112.70.22:80 connect:
Connection refused

Mar 13 10:45:33 dino-lb1 pound: no back-end "GET / HTTP/1.1" from
172.30.2.3

Mar 13 10:45:38 dino-lb1 pound: no back-end "GET / HTTP/1.1" from
172.30.2.3

Mar 13 10:45:40 dino-lb1 pound: backend 10.112.70.22:81 connect:
Connection refused

Mar 13 10:45:40 dino-lb1 pound: no back-end "GET /?rc=10000&source=myTMN
HTTP/1.1" from 172.30.2.3

Mar 13 10:45:41 dino-lb1 pound: no back-end "GET / HTTP/1.1" from
172.30.2.3

Mar 13 10:45:48 dino-lb1 last message repeated 2 times

Mar 13 10:45:55 dino-lb1 pound: backend 10.112.70.22:82 connect:
Connection refused

Mar 13 10:45:55 dino-lb1 pound: no back-end "GET / HTTP/1.1" from
172.30.2.3

Mar 13 10:45:58 dino-lb1 pound: no back-end "GET /?rc=10000&source=myTMN
HTTP/1.1" from 172.30.2.3

Mar 13 10:45:58 dino-lb1 pound: no back-end "GET
/delivery/http?buyid=1338952 HTTP/1.1" from 172.30.2.3

 

Mar 13 11:26:40 dino-lb1 pound: starting...

Mar 13 11:26:52 dino-lb1 pound: 172.30.2.3 GET / HTTP/1.1 - HTTP/1.1 200
OK (10.112.70.22:80)

Mar 13 11:26:54 dino-lb1 pound: 10.112.70.245 GET / HTTP/1.0 - HTTP/1.1
200 OK (10.112.70.22:80)

Mar 13 11:26:56 dino-lb1 pound: 172.30.2.3 GET
/content/icons/jogos.gif?iid=1&resize=30 HTTP/1.1 - HTTP/1.1 200 OK
(10.112.70

.22:80)

Mar 13 11:26:56 dino-lb1 pound: 172.30.2.3 GET
/content/icons/top10.gif?iid=1&resize=30 HTTP/1.1 - HTTP/1.1 200 OK
(10.112.70

.22:80)

 

 

 

My pound.cfg

 

        LogFacility local2

        LogLevel 2

 

        ListenHTTP

            Address 0.0.0.0

            Port    80

 

            Service

                Session

                        Type    HEADER

                        ID      "x-up-calling-line-id"

                        TTL     300

                End

 

                BackEnd

                    Address dino-fe1

                    Port    80

                    HAport  12060

                End

                BackEnd

                    Address dino-fe2

                    Port    80

                    HAport  12060

                End

 

 

            End

 

        End

 

        ListenHTTP

            Address 0.0.0.0

            Port    81

 

            Service

                Session

                        Type    HEADER

                        ID      "x-up-calling-line-id"

                        TTL     300

                End

 

                BackEnd

                    Address dino-fe1

                    Port    81

                    HAport  12060

                End

                BackEnd

                    Address dino-fe2

                    Port    81

                    HAport  12060

                End

 

            End

 

        End

 

        ListenHTTP

            Address 0.0.0.0

            Port    82

 

            Service

                Session

                        Type    HEADER

                        ID      "x-up-calling-line-id"

                        TTL     300

                End

 

                BackEnd

                    Address dino-fe1

                    Port    82

                    HAport  12060

                End

                BackEnd

                    Address dino-fe2

                    Port    82

                    HAport  12060

                End

 

            End

 

        End

 

 

        ListenHTTP

            Address 0.0.0.0

            Port    83

 

            Service

                Session

                        Type    HEADER

                        ID      "x-up-calling-line-id"

                        TTL     300

                End

 

                BackEnd

                    Address dino-fe1

                    Port    83

                    HAport  12060

                End

                BackEnd

                    Address dino-fe2

                    Port    83

                    HAport  12060

                End

 

 

            End

 

        End
Attachments:  
text.html text/html 35461 Bytes

Re: [Pound Mailing List] monitor thread crashing and "pound confused"
Robert Segall <roseg(at)apsis.ch>
2006-03-13 13:39:54 [ FULL ]
On Mon, 2006-03-13 at 11:35 +0000, Sergio Freire wrote:[...]

It's your WORKER that died. As I said in a previous message, I'd really
like to know why - as many details on what exactly happened would be
important in order to find out why.
[...]

It didn't. See comments below...
[...]

A connection attempt to the back-end failed, so the back-end is
considered dead (until resurrected during a later check).
[...]

It's still dead...
[...]

Nasty - need to find out why.
[...]

Your HAport check failed.
[...]

As above.
[...]

None of these goes to 10.112.70.22:82, so for all we know it may still
be down (possibly including the HAport).
[...]

Try adding

	TimeOut 30

or larger to your BackEnd definitions and see if it helps.

You may also want

	Alive 5

in your config to have your checks occur more often.[...]

RE: [Pound Mailing List] monitor thread crashing and "pound confused"
"Sergio Freire" <sergio-s-freire(at)ptinovacao.pt>
2006-03-13 14:46:59 [ FULL ]
Ok, thanks.
What I don't understand is why Pound was considering the backend dead
when all where up and running in that backend including the service
which runs in the HAport.
Another thing to explain is the fact that Pound stop logging for a
minutes although the request to Pound where being made. Also, it did not
balance to any of the backends.. It seems that it got stuck somehow
since the process was running but it couldn't see that the backends
where alive neither it log the requests... 

How can I had some kind of debug into the worker thread to find the
cause of the crash?

Regards,
Sergio Freire

-----Original Message-----
From: Robert Segall [mailto:roseg(at)apsis.ch] 
Sent: Monday, March 13, 2006 12:40 PM
To: pound(at)apsis.ch
Subject: Re: [Pound Mailing List] monitor thread crashing and "pound
confused"

On Mon, 2006-03-13 at 11:35 +0000, Sergio Freire wrote:[...]
two[...]

It's your WORKER that died. As I said in a previous message, I'd really
like to know why - as many details on what exactly happened would be
important in order to find out why.
[...]
HAport[...]

It didn't. See comments below...
[...]

A connection attempt to the back-end failed, so the back-end is
considered dead (until resurrected during a later check).
[...]
/?rc=10000&source=myTMN[...]

It's still dead...
[...]

Nasty - need to find out why.
[...]
(HA)

Your HAport check failed.
[...]

As above.
[...]
/?rc=10000&source=myTMN[...]
/?rc=10000&source=myTMN[...]
200[...]
HTTP/1.1[...]

None of these goes to 10.112.70.22:82, so for all we know it may still
be down (possibly including the HAport).
[...]

Try adding

	TimeOut 30

or larger to your BackEnd definitions and see if it helps.

You may also want

	Alive 5

in your config to have your checks occur more often.[...]

RE: [Pound Mailing List] monitor thread crashing and "pound confused"
Robert Segall <roseg(at)apsis.ch>
2006-03-13 16:15:21 [ FULL ]
On Mon, 2006-03-13 at 13:46 +0000, Sergio Freire wrote:[...]

A back-end may be unreachable for a variety of reasons: it may really be
down, there may be a network problem or it could be overloaded and thus
too slow to respond to requests. This is why I suggested you increase
the time-out to the back-ends.
[...]

Since you use syslog you may have a few systems interacting. Try
compiling and running without sylog (logging to stdout/stderr) to see
the messages in real-time on your console.

Pound stops "load-balancing" once a back-end is dead - all requests are
sent to the remaining back-ends. By decreasing the Alive interval you
tell Pound to check more often on the back-ends, and resurrect them
sooner.
[...]

The best would be to sniff the requests coming in. A logging proxy such
as tcpwatch is great, used between the clients and Pound. That way you
can see exactly what request caused the segmentation violation - and
that is very valuable information for us.[...]

RE: [Pound Mailing List] monitor thread crashing and "pound confused"
"Sergio Freire" <sergio-s-freire(at)ptinovacao.pt>
2006-03-13 16:49:11 [ FULL ]
Ok Robert,
It seems to me that the seg fault is within the thr_http thread.
It happens very rarely, I guess the only way to find the cause is to add
lots of "printf's" into the code to isolate the segfault.
I have asked to some guys but they could not tell me of way to generate
a core dump when a thread crashes... thus is hard to find things like
these. I thought there was an easier way just like at least a core dump
when a process crashes... if you know any other way than that and the
one you described, it would be great.
Regards,
Sergio Freire

-----Original Message-----
From: Robert Segall [mailto:roseg(at)apsis.ch] 
Sent: Monday, March 13, 2006 3:15 PM
To: pound(at)apsis.ch
Subject: RE: [Pound Mailing List] monitor thread crashing and "pound
confused"

On Mon, 2006-03-13 at 13:46 +0000, Sergio Freire wrote:[...]

A back-end may be unreachable for a variety of reasons: it may really be
down, there may be a network problem or it could be overloaded and thus
too slow to respond to requests. This is why I suggested you increase
the time-out to the back-ends.
[...]
not[...]

Since you use syslog you may have a few systems interacting. Try
compiling and running without sylog (logging to stdout/stderr) to see
the messages in real-time on your console.

Pound stops "load-balancing" once a back-end is dead - all requests are
sent to the remaining back-ends. By decreasing the Alive interval you
tell Pound to check more often on the back-ends, and resurrect them
sooner.
[...]

The best would be to sniff the requests coming in. A logging proxy such
as tcpwatch is great, used between the clients and Pound. That way you
can see exactly what request caused the segmentation violation - and
that is very valuable information for us.[...]

RE: [Pound Mailing List] monitor thread crashing and "pound confused"
Robert Segall <roseg(at)apsis.ch>
2006-03-13 17:08:06 [ FULL ]
On Mon, 2006-03-13 at 15:49 +0000, Sergio Freire wrote:[...]

Given that nothing else is active that is pretty much guaranteed to be
the case...
[...]

That would be a rather tedious way of doing it. A log of the traffic
would be probably easier, and we can move from there once we have
identified the circumstances that cause it to happen.
[...]

A segmentation fault will generate a core dump (if you allow it to). The
trouble is that most debuggers cannot really deal with core dumps from
multi-threaded programs, so it doesn't really help that much.[...]

MailBoxer