/ Zope / Apsis / Pound Mailing List / Archive / 2010 / 2010-12 / Website stalls every 60 seconds

[ << ] [ >> ]

[ Redirection problem / "Ament, Andreas" ... ] [ Error Status / "BUPE(Peter ... ]

Website stalls every 60 seconds
Steven van der Vegt <steven(at)echelon.nl>
2010-12-22 14:50:15 [ SNIP ]
Hello list,

We are a ISP and happy Poundusers! But on one system we experience some
problems. For this system Pound is configured to use cookies for session
tracking. It is installed on a PF-Sense firewall (FreeBSD) which isn't a very
powerful device (Celeron M. 1Ghz 512Mb). We notice that the website seems to
stall every minute for 10 seconds. After looking through the Pound code I
notice the do_expire function, which is called every 60 seconds. It runs over
all sessions in the hash and checks if they are expired. If so, deletes them.
This takes almost 10 seconds and since the hashtable is locked no lookups and
inserts are permitted. There are ~3000 sessions in the table and Pound is
handling ~100 hits/s.
Obviously this is a locking problem. The first thing that comes to mind is:
drop the non-threadsafe hashtable. But since this structure is a big part of
the code this is not a fast-to-implement solution. Maybe we can sharpen up the
lock-policy. Instead of locking the complete table during the t_expire, maybe
only lock it when actually deleting a node (in t_old). I studied the code but
I'm not sure if it is a (thread)safe solution. What do you think?
I wonder, are we the only one experiencing these troubles? Is this a known
problem?

Kind regards,

Steven van der Vegt
Echelon B.V. The Netherlands

RE: Website stalls every 60 seconds
Joe Gooch <mrwizard(at)k12system.com>
2010-12-22 15:49:32 [ SNIP ]
I believe the hashtable implementation is from the openssl library, so the
implementation isn't part of pound.  From my quick look, it looks like the
lhash nodes are using a linked list to hold the hashtable collisions... Which
would need a lock on insert and delete because the structure changes.  I guess
what you're really looking for is a read/write locking system. (i.e. t_expire
does a read lock, new sessions try a write lock and fail until the read lock is
done, but multiple read locks can be in place at a time)  Similar to Java's
ReadWriteLock.  (looks like pthreads has rwlocks instead of mutexes)

Maybe instead of calling delete session it could create a temporary array or
linked list to save the nodes that need to be deleted and then delete them in
one shot... would require a temporary malloc or heap space... But even if it
was a static array, t_expire is only going to run in a single thread, one at a
time...  and if it didn't delete every expired session every time, that
wouldn't be so bad, it's just going to run again anyway.

What concerns me more efficiency-wise is the method of deletion... Since the
hashtable is a black box, we can't remove the in-place node...  the tree
traversal has already located the node we want to remove, and once we do our
comparison, we have to do lh_delete which will go through the table and locate
it all over again.  Depending on how good the hash is and how many collisions
you have that could be significant.

It could mean that in your case t_hash isn't returning significantly different
hash values.  It'd probably require digging into the lhash stats numbers (i.e.
from openssl):

" If you are interested in performance the field to watch is num_comp_calls.
The hash library keeps track of the 'hash' value for each item so when a lookup
is done, the 'hashes' are compared, if there is a match, then a full compare is
done, and hash->num_comp_calls is incremented. If num_comp_calls is not equal
to num_delete plus num_retrieve it means that your hash function is generating
hashes that are the same for different values. It is probably worth changing
your hash function if this is the case because even if your hash table has 10
items in a 'bucket', it can be searched with 10 unsigned long compares and 10
linked list traverses. This will be much less expensive that 10 calls to your
compare function."

Also interesting:
" When doing this, be careful if you delete entries from the hash table in your
callbacks: the table may decrease in size, moving the item that you are
currently on down lower in the hash table - this could cause some entries to be
skipped during the iteration. The second best solution to this problem is to
set hash->down_load=0 before you start (which will stop the hash table ever
decreasing in size). The best solution is probably to avoid deleting items from
the hash table inside a ``doall'' callback!"

Pound does set down_load to 0, so it does do that... Saving a list of nodes in
a temporary array and then deleting them in one shot would fix this.

What type of data is in the cookie?  Is it something like jsessionid?  How long
is your TTL?

You could set your EXPIRE_TO higher... seems to me that's only a stopgap
though.

We use pound on high volume sites and haven't had any issues... Then again, my
weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram. 
Production has more horsepower.

Joe

> -----Original Message-----
> From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> Sent: Wednesday, December 22, 2010 8:50 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] Website stalls every 60 seconds
> 
> Hello list,
> 
> We are a ISP and happy Poundusers! But on one system we experience some
> problems. For this system Pound is configured to use cookies for
> session tracking. It is installed on a PF-Sense firewall (FreeBSD)
> which isn't a very powerful device (Celeron M. 1Ghz 512Mb). We notice
> that the website seems to stall every minute for 10 seconds. After
> looking through the Pound code I notice the do_expire function, which
> is called every 60 seconds. It runs over all sessions in the hash and
> checks if they are expired. If so, deletes them. This takes almost 10
> seconds and since the hashtable is locked no lookups and inserts are
> permitted. There are ~3000 sessions in the table and Pound is handling
> ~100 hits/s.
> Obviously this is a locking problem. The first thing that comes to mind
> is: drop the non-threadsafe hashtable. But since this structure is a
> big part of the code this is not a fast-to-implement solution. Maybe we
> can sharpen up the lock-policy. Instead of locking the complete table
> during the t_expire, maybe only lock it when actually deleting a node
> (in t_old). I studied the code but I'm not sure if it is a (thread)safe
> solution. What do you think?
> I wonder, are we the only one experiencing these troubles? Is this a
> known problem?
> 
> Kind regards,
> 
> Steven van der Vegt
> Echelon B.V. The Netherlands
> 
> --
> To unsubscribe send an email with subject unsubscribe to
> pound(at)apsis.ch.
> Please contact roseg(at)apsis.ch for questions.

RE: Website stalls every 60 seconds
Steven van der Vegt <steven(at)echelon.nl>
2010-12-23 12:14:32 [ SNIP ]
Hi Joe,

Thanks for your quick and careful response! A few things that came to mind:

> I guess what you're really looking for is a read/write locking system. 
That's something I had in mind too. This will be a speedup in general since all
the read threads wouldn't be waiting on each other anymore.

> Maybe instead of calling delete session it could create a temporary array or
linked list to save the nodes that need to be deleted and then delete them in
one shot...
Great suggestion. Like a static array of n elements. If the array is full: lock
-> delete -> unlock. If there are any unchecked elements left, continue
checking and deleting.

> What concerns me more efficiency-wise is the method of deletion... Since the
hashtable is a black box, we can't remove the in-place node...  the tree
traversal has already located the node we want to remove, and once we do our
comparison, we have to do lh_delete which will go through the table and locate
it all over again.  Depending on how good the hash is and how many collisions
you have that could be significant.

That is a waste of recourses indeed, but since the blackbox principle, I don't
see a way around it except for taking over some responsibility of the lhash,
which is somewhat "not done". However, the lhash code hasn't changed within 12
years or so...

> It could mean that in your case t_hash isn't returning significantly
different hash values.
I will dig into that. For now the standard lhash lh_strhash function is used.
If the values suggest a lot of hash collisions take place, is MD5 a good
alternative?

> What type of data is in the cookie?  Is it something like jsessionid?  How
long is your TTL?
The cookie values are pretty basic, they only contain hashkeys. TTL is set on
20 minutes.

> We use pound on high volume sites and haven't had any issues... Then again,
my weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram. 
Production has more horsepower.
For now we have put Pound onto a ~3Ghz box. This solves the problem, but in
mine opinion that's a waste of iron.

Would you all be interested if I look into the delete-all-at-once and the
readers-writers-lock?

Kind regards,

Steven

-----Oorspronkelijk bericht-----
Van: Joe Gooch [mailto:mrwizard(at)k12system.com] 
Verzonden: woensdag 22 december 2010 15:50
Aan: pound(at)apsis.ch
Onderwerp: [Pound Mailing List] RE: Website stalls every 60 seconds

I believe the hashtable implementation is from the openssl library, so the
implementation isn't part of pound.  From my quick look, it looks like the
lhash nodes are using a linked list to hold the hashtable collisions... Which
would need a lock on insert and delete because the structure changes.  I guess
what you're really looking for is a read/write locking system. (i.e. t_expire
does a read lock, new sessions try a write lock and fail until the read lock is
done, but multiple read locks can be in place at a time)  Similar to Java's
ReadWriteLock.  (looks like pthreads has rwlocks instead of mutexes)

Maybe instead of calling delete session it could create a temporary array or
linked list to save the nodes that need to be deleted and then delete them in
one shot... would require a temporary malloc or heap space... But even if it
was a static array, t_expire is only going to run in a single thread, one at a
time...  and if it didn't delete every expired session every time, that
wouldn't be so bad, it's just going to run again anyway.

What concerns me more efficiency-wise is the method of deletion... Since the
hashtable is a black box, we can't remove the in-place node...  the tree
traversal has already located the node we want to remove, and once we do our
comparison, we have to do lh_delete which will go through the table and locate
it all over again.  Depending on how good the hash is and how many collisions
you have that could be significant.

It could mean that in your case t_hash isn't returning significantly different
hash values.  It'd probably require digging into the lhash stats numbers (i.e.
from openssl):

" If you are interested in performance the field to watch is num_comp_calls.
The hash library keeps track of the 'hash' value for each item so when a lookup
is done, the 'hashes' are compared, if there is a match, then a full compare is
done, and hash->num_comp_calls is incremented. If num_comp_calls is not equal
to num_delete plus num_retrieve it means that your hash function is generating
hashes that are the same for different values. It is probably worth changing
your hash function if this is the case because even if your hash table has 10
items in a 'bucket', it can be searched with 10 unsigned long compares and 10
linked list traverses. This will be much less expensive that 10 calls to your
compare function."

Also interesting:
" When doing this, be careful if you delete entries from the hash table in your
callbacks: the table may decrease in size, moving the item that you are
currently on down lower in the hash table - this could cause some entries to be
skipped during the iteration. The second best solution to this problem is to
set hash->down_load
efore you start (which will stop the hash table ever decreasing in size). The
best solution is probably to avoid deleting items from the hash table inside a
``doall'' callback!"

Pound does set down_load to 0, so it does do that... Saving a list of nodes in
a temporary array and then deleting them in one shot would fix this.

What type of data is in the cookie?  Is it something like jsessionid?  How long
is your TTL?

You could set your EXPIRE_TO higher... seems to me that's only a stopgap
though.

We use pound on high volume sites and haven't had any issues... Then again, my
weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram. 
Production has more horsepower.

Joe

> -----Original Message-----
> From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> Sent: Wednesday, December 22, 2010 8:50 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] Website stalls every 60 seconds
> 
> Hello list,
> 
> We are a ISP and happy Poundusers! But on one system we experience some
> problems. For this system Pound is configured to use cookies for
> session tracking. It is installed on a PF-Sense firewall (FreeBSD)
> which isn't a very powerful device (Celeron M. 1Ghz 512Mb). We notice
> that the website seems to stall every minute for 10 seconds. After
> looking through the Pound code I notice the do_expire function, which
> is called every 60 seconds. It runs over all sessions in the hash and
> checks if they are expired. If so, deletes them. This takes almost 10
> seconds and since the hashtable is locked no lookups and inserts are
> permitted. There are ~3000 sessions in the table and Pound is handling
> ~100 hits/s.
> Obviously this is a locking problem. The first thing that comes to mind
> is: drop the non-threadsafe hashtable. But since this structure is a
> big part of the code this is not a fast-to-implement solution. Maybe we
> can sharpen up the lock-policy. Instead of locking the complete table
> during the t_expire, maybe only lock it when actually deleting a node
> (in t_old). I studied the code but I'm not sure if it is a (thread)safe
> solution. What do you think?
> I wonder, are we the only one experiencing these troubles? Is this a
> known problem?
> 
> Kind regards,
> 
> Steven van der Vegt
> Echelon B.V. The Netherlands
> 
> --
> To unsubscribe send an email with subject unsubscribe to
> pound(at)apsis.ch.
> Please contact roseg(at)apsis.ch for questions.

--
To unsubscribe send an email with subject unsubscribe to pound(at)apsis.ch.
Please contact roseg(at)apsis.ch for questions.

RE: Website stalls every 60 seconds
Steven van der Vegt <steven(at)echelon.nl>
2010-12-23 17:23:50 [ SNIP ]
Since this problem is of somewhat a priority to us, I started to implement the
delete-at-once functionality. It works, but I still need to implement
read/write locks because during the execution of the doall() method we can't
afford an insert.

Do you guys have a good testing method?

I'm looking forward to your opinions about this solution. If I'm not responding
to email the next few days, that's because I'm busy cooking and celebrating
Christmas :-)

Steven

-----Oorspronkelijk bericht-----
Van: Steven van der Vegt [mailto:steven(at)echelon.nl] 
Verzonden: donderdag 23 december 2010 12:15
Aan: pound(at)apsis.ch
Onderwerp: [Pound Mailing List] RE: Website stalls every 60 seconds

Hi Joe,

Thanks for your quick and careful response! A few things that came to mind:

> I guess what you're really looking for is a read/write locking system. 
That's something I had in mind too. This will be a speedup in general since all
the read threads wouldn't be waiting on each other anymore.

> Maybe instead of calling delete session it could create a temporary array or
linked list to save the nodes that need to be deleted and then delete them in
one shot...
Great suggestion. Like a static array of n elements. If the array is full: lock
-> delete -> unlock. If there are any unchecked elements left, continue
checking and deleting.

> What concerns me more efficiency-wise is the method of deletion... Since the
hashtable is a black box, we can't remove the in-place node...  the tree
traversal has already located the node we want to remove, and once we do our
comparison, we have to do lh_delete which will go through the table and locate
it all over again.  Depending on how good the hash is and how many collisions
you have that could be significant.

That is a waste of recourses indeed, but since the blackbox principle, I don't
see a way around it except for taking over some responsibility of the lhash,
which is somewhat "not done". However, the lhash code hasn't changed within 12
years or so...

> It could mean that in your case t_hash isn't returning significantly
different hash values.
I will dig into that. For now the standard lhash lh_strhash function is used.
If the values suggest a lot of hash collisions take place, is MD5 a good
alternative?

> What type of data is in the cookie?  Is it something like jsessionid?  How
long is your TTL?
The cookie values are pretty basic, they only contain hashkeys. TTL is set on
20 minutes.

> We use pound on high volume sites and haven't had any issues... Then again,
my weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram. 
Production has more horsepower.
For now we have put Pound onto a ~3Ghz box. This solves the problem, but in
mine opinion that's a waste of iron.

Would you all be interested if I look into the delete-all-at-once and the
readers-writers-lock?

Kind regards,

Steven

-----Oorspronkelijk bericht-----
Van: Joe Gooch [mailto:mrwizard(at)k12system.com] 
Verzonden: woensdag 22 december 2010 15:50
Aan: pound(at)apsis.ch
Onderwerp: [Pound Mailing List] RE: Website stalls every 60 seconds

I believe the hashtable implementation is from the openssl library, so the
implementation isn't part of pound.  From my quick look, it looks like the
lhash nodes are using a linked list to hold the hashtable collisions... Which
would need a lock on insert and delete because the structure changes.  I guess
what you're really looking for is a read/write locking system. (i.e. t_expire
does a read lock, new sessions try a write lock and fail until the read lock is
done, but multiple read locks can be in place at a time)  Similar to Java's
ReadWriteLock.  (looks like pthreads has rwlocks instead of mutexes)

Maybe instead of calling delete session it could create a temporary array or
linked list to save the nodes that need to be deleted and then delete them in
one shot... would require a temporary malloc or heap space... But even if it
was a static array, t_expire is only going to run in a single thread, one at a
time...  and if it didn't delete every expired session every time, that
wouldn't be so bad, it's just going to run again anyway.

What concerns me more efficiency-wise is the method of deletion... Since the
hashtable is a black box, we can't remove the in-place node...  the tree
traversal has already located the node we want to remove, and once we do our
comparison, we have to do lh_delete which will go through the table and locate
it all over again.  Depending on how good the hash is and how many collisions
you have that could be significant.

It could mean that in your case t_hash isn't returning significantly different
hash values.  It'd probably require digging into the lhash stats numbers (i.e.
from openssl):

" If you are interested in performance the field to watch is num_comp_calls.
The hash library keeps track of the 'hash' value for each item so when a lookup
is done, the 'hashes' are compared, if there is a match, then a full compare is
done, and hash->num_comp_calls is incremented. If num_comp_calls is not equal
to num_delete plus num_retrieve it means that your hash function is generating
hashes that are the same for different values. It is probably worth changing
your hash function if this is the case because even if your hash table has 10
items in a 'bucket', it can be searched with 10 unsigned long compares and 10
linked list traverses. This will be much less expensive that 10 calls to your
compare function."

Also interesting:
" When doing this, be careful if you delete entries from the hash table in your
callbacks: the table may decrease in size, moving the item that you are
currently on down lower in the hash table - this could cause some entries to be
skipped during the iteration. The second best solution to this problem is to
set hash->down_load
efore you start (which will stop the hash table ever decreasing in size). The
best solution is probably to avoid deleting items from the hash table inside a
``doall'' callback!"

Pound does set down_load to 0, so it does do that... Saving a list of nodes in
a temporary array and then deleting them in one shot would fix this.

What type of data is in the cookie?  Is it something like jsessionid?  How long
is your TTL?

You could set your EXPIRE_TO higher... seems to me that's only a stopgap
though.

We use pound on high volume sites and haven't had any issues... Then again, my
weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram. 
Production has more horsepower.

Joe

> -----Original Message-----
> From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> Sent: Wednesday, December 22, 2010 8:50 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] Website stalls every 60 seconds
> 
> Hello list,
> 
> We are a ISP and happy Poundusers! But on one system we experience some
> problems. For this system Pound is configured to use cookies for
> session tracking. It is installed on a PF-Sense firewall (FreeBSD)
> which isn't a very powerful device (Celeron M. 1Ghz 512Mb). We notice
> that the website seems to stall every minute for 10 seconds. After
> looking through the Pound code I notice the do_expire function, which
> is called every 60 seconds. It runs over all sessions in the hash and
> checks if they are expired. If so, deletes them. This takes almost 10
> seconds and since the hashtable is locked no lookups and inserts are
> permitted. There are ~3000 sessions in the table and Pound is handling
> ~100 hits/s.
> Obviously this is a locking problem. The first thing that comes to mind
> is: drop the non-threadsafe hashtable. But since this structure is a
> big part of the code this is not a fast-to-implement solution. Maybe we
> can sharpen up the lock-policy. Instead of locking the complete table
> during the t_expire, maybe only lock it when actually deleting a node
> (in t_old). I studied the code but I'm not sure if it is a (thread)safe
> solution. What do you think?
> I wonder, are we the only one experiencing these troubles? Is this a
> known problem?
> 
> Kind regards,
> 
> Steven van der Vegt
> Echelon B.V. The Netherlands
> 
> --
> To unsubscribe send an email with subject unsubscribe to
> pound(at)apsis.ch.
> Please contact roseg(at)apsis.ch for questions.

--
To unsubscribe send an email with subject unsubscribe to pound(at)apsis.ch.
Please contact roseg(at)apsis.ch for questions.

--
To unsubscribe send an email with subject unsubscribe to pound(at)apsis.ch.
Please contact roseg(at)apsis.ch for questions.

RE: Website stalls every 60 seconds
Joe Gooch <mrwizard(at)k12system.com>
2010-12-23 17:38:11 [ SNIP ]
> -----Original Message-----
> From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> Sent: Thursday, December 23, 2010 6:15 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] RE: Website stalls every 60 seconds
> 
> Hi Joe,
> 
> Thanks for your quick and careful response! A few things that came to
> mind:
> 
> > I guess what you're really looking for is a read/write locking
> system.
> That's something I had in mind too. This will be a speedup in general
> since all the read threads wouldn't be waiting on each other anymore.

...and it won't require any additional libraries, which will make Robert happy.
 
> > Maybe instead of calling delete session it could create a temporary
> array or linked list to save the nodes that need to be deleted and then
> delete them in one shot...
> Great suggestion. Like a static array of n elements. If the array is
> full: lock -> delete -> unlock. If there are any unchecked elements
> left, continue checking and deleting.

I was thinking more like if the array is full, stop looking for more, lock
delete and unlock.  It's just going to find the rest in the next poll anyway.
Then you don't have to worry about read/write contention either, since pthreads
doesn't have a "upgrade me from read to write" call you'll probably have to
release the read lock to obtain the write. (which will need to block until all
read locks release)  Once you do that and start deleting, or even inserts from
other threads occur, the rest of the tree's structure from the previous
iteration might not be valid anymore.  (Especially if you don't set down_load=0
before delete, which you would no longer have to do, because tree shrinking
would be fine.  But it would invalidate the iterator.)  I'm making some
assumptions about the implementation of lh_doall but I would hope it's
implemented that way. :)  As we've already determined it's not thread safe in
itself.
 
> > What concerns me more efficiency-wise is the method of deletion...
> Since the hashtable is a black box, we can't remove the in-place
> node...  the tree traversal has already located the node we want to
> remove, and once we do our comparison, we have to do lh_delete which
> will go through the table and locate it all over again.  Depending on
> how good the hash is and how many collisions you have that could be
> significant.
> 
> That is a waste of recourses indeed, but since the blackbox principle,
> I don't see a way around it except for taking over some responsibility
> of the lhash, which is somewhat "not done". However, the lhash code
> hasn't changed within 12 years or so...

Pound used to have its own linked list implementation for sessions (back in the
2.2 or 2.0 days) but I'd agree it was a logical move to lhash.

You're right, there probably isn't much we can do there other than make sure
the atomic section is as small as possible.
 
> > It could mean that in your case t_hash isn't returning significantly
> different hash values.
> I will dig into that. For now the standard lhash lh_strhash function is
> used. If the values suggest a lot of hash collisions take place, is MD5
> a good alternative?

The hash function is actually provided in config.c with the call to lh_new...
There's a t_hash function in the code there with a comment based on "fmv".

Hashing calculations will be used on every request... From looking at t_hash
it's probably more efficient than Md5 would be...  That might cause other speed
issues.  But, for instance, if your hash keys are already long values (like
jsessionids are like guids) maybe just hex string -> long would work and rely
on the backend to handle uniqueness.  Guids are longer than 32bits but I don't
think the collision stats on that would be too bad.
 
> > What type of data is in the cookie?  Is it something like jsessionid?
> How long is your TTL?
> The cookie values are pretty basic, they only contain hashkeys. TTL is
> set on 20 minutes.

Seems to jive with above.
 
> Would you all be interested if I look into the delete-all-at-once and
> the readers-writers-lock?

I will admit I'm intrigued and have been considering taking a stab at it
myself.  However you have the test bed already in place so yeah, I'd love to
see what you come up with.

Take care!
Joe

RE: Website stalls every 60 seconds
Joe Gooch <mrwizard(at)k12system.com>
2010-12-23 17:41:14 [ SNIP ]
I've been known to set the TTL values low (say, 1 minute) and use JMeter or
similar to set up a fleet of zombie web browsers.  It'll give you all the stats
of average response time, responses per second, etc.

As far as pound debugging I usually do the low-tech "add printfs to the code"
approach. :)

Joe


> -----Original Message-----
> From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> Sent: Thursday, December 23, 2010 11:24 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] RE: Website stalls every 60 seconds
> 
> Since this problem is of somewhat a priority to us, I started to
> implement the delete-at-once functionality. It works, but I still need
> to implement read/write locks because during the execution of the
> doall() method we can't afford an insert.
> 
> Do you guys have a good testing method?
> 
> I'm looking forward to your opinions about this solution. If I'm not
> responding to email the next few days, that's because I'm busy cooking
> and celebrating Christmas :-)
> 
> Steven

RE: Website stalls every 60 seconds
Steven van der Vegt <steven(at)echelon.nl>
2010-12-30 17:17:50 [ SNIP ]
Today I finished rewriting the hashtable locking logic. I've added the diff
below.
Still a few consideration to make:
For now I use an array of a fixed size which is used to store the elements to
be deleted.
This array is filled (with a read lock) through the do_all lhash function which
calls the t_old function.
If the array is full the do_all method can't be stopped and thus t_old just
returns. After do_all is finished, the items are deleted (with a write lock).
There are a few things that come to mind with this solution. The size of the
array is fixed. What if every minute more items are added than deleted? I
suggest that we make this size variable. Either in the config or calculate it.

Also I found a bug in the Makefile. I edited the pound.h file but the poundctl
binairy didn't rebuild. I think there's something wrong with the dependencies.

Here are two benchmarks. The table is cleaned every 5 seconds. The size of the
array is 512 (this may need some tweaking less can be better).

Configfile:

ListenHTTP
        Address Address xxx.xxx.xxx.186
        Port 88
        Service
                BackEnd
                        Address xxx.xxx.xxx.186
                        Port 40081
                End

                BackEnd
                        Address xxx.xxx.xxx.186
                        Port 40081
                End

                Session
                        Type Cookie
                        ID "pound"
                        TTL 100
                End
        End
End

Version 2.6b:
# ab -n 20000 -c 20 http://xxx.xxx.xxx.xxx/pound.mhtml

Document Path:          /pound.mhtml
Document Length:        87 bytes

Concurrency Level:      20
Time taken for tests:   35.784378 seconds
Complete requests:      20000
Failed requests:        0
Write errors:           0
Total transferred:      5100255 bytes
HTML transferred:       1740087 bytes
Requests per second:    558.90 [#/sec] (mean)
Time per request:       35.784 [ms] (mean)
Time per request:       1.789 [ms] (mean, across all concurrent requests)
Transfer rate:          139.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    4  87.4      2    3001
Processing:     1   30  16.7     32     309
Waiting:        1   16  10.8     16     308
Total:          1   35  89.1     34    3050

Percentage of the requests served within a certain time (ms)
  50%     34
  66%     39
  75%     42
  80%     45
  90%     51
  95%     56
  98%     62
  99%     67
 100%   3050 (longest request)

 
Modified version:

Document Path:          /pound.mhtml
Document Length:        87 bytes

Concurrency Level:      20
Time taken for tests:   36.715796 seconds
Complete requests:      20000
Failed requests:        0
Write errors:           0
Total transferred:      5100255 bytes
HTML transferred:       1740087 bytes
Requests per second:    544.72 [#/sec] (mean)
Time per request:       36.716 [ms] (mean)
Time per request:       1.836 [ms] (mean, across all concurrent requests)
Transfer rate:          135.64 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    3  87.4      1    3002
Processing:     1   32  25.1     34     775
Waiting:        1   15  19.4     13     773
Total:          1   36  90.8     35    3059

Percentage of the requests served within a certain time (ms)
  50%     35
  66%     41
  75%     45
  80%     47
  90%     55
  95%     63
  98%     74
  99%     84
 100%   3059 (longest request)
 
Almost no difference. (a little disapointed, I hoped mine version was faster
;-) )
 
Please take a look at the diff. Depending on your commends we'll try it on the
original (slower) hardware.
 
The diff:

diff Pound-2.6b/config.c Pound-2.6b-new/config.c
197c197,198
<     pthread_mutex_init(&res->mut, NULL);
---
>     // TODO: sets some attributes to this lock to regulate readers/writers
priority
>     pthread_rwlock_init(&res->lock, NULL);
462c463,464
<     pthread_mutex_init(&res->mut, NULL);
---
>     // TODO: sets some attributes to this lock to regulate readers/writers
priority
>     pthread_rwlock_init(&res->lock, NULL);
535c537,538
<             pthread_mutex_init(& be->mut, NULL);
---
>             // TODO: initialize with params which regulates rw-priorities
>             pthread_rwlock_init(&be->lock, NULL);
561c564,565
<             pthread_mutex_init(& be->mut, NULL);
---
>             // TODO: initialize with params which regulates rw-priorities
>             pthread_rwlock_init(&be->lock, NULL);
diff Pound-2.6b/pound.h Pound-2.6b-new/pound.h
282a283,285
> /* the size of the array which stores the items to be deleted*/
> #define REMOVE_MAX 512
> 
311c314
<     pthread_mutex_t     mut;        /* mutex for this back-end */
---
>     pthread_rwlock_t    lock;       /* rw-lock for this back-end */
342c345
<     pthread_mutex_t     mut;        /* mutex for this service */
---
>     pthread_rwlock_t    lock;       /* rw-lock for this service */
diff Pound-2.6b/svc.c Pound-2.6b-new/svc.c
99a100
> 
104a106,108
>     TABNODE **remove;
>     int     remove_size;
>     pthread_rwlock_t *lock;
106a111,115
> 
> /*
>  * Remove all expired sessions at one since this locks the table much shorter
>  * Requires a readlock on the hashtable.
>  */
108c117
< t_old(TABNODE *t, void *arg)
---
> t_remove_at_once(ALL_ARG *a)
110c119,121
<     ALL_ARG *a;
---
>     int down_load = a->tab->down_load;
>     a->tab->down_load = 0;
> 
113,115c124,126
<     a = (ALL_ARG *)arg;
<     if(t->last_acc < a->lim)
<         if((res = lh_delete(a->tab, t)) != NULL) {
---
>     int i = 0;
>     for (;i < a->remove_size;++i) {
>         if((res = lh_delete(a->tab, a->remove[i])) != NULL) {
119a131,157
>     }
>     
>     a->tab->down_load = down_load;
>     a->remove_size = 0;
> 
>     return;
> }
> 
> /*
>  * Check if a node is expired. If so add to the delete list.
>  * If this list if full, delete the items in the list.
>  */
> static void
> t_old(TABNODE *t, void *arg)
> {
>     ALL_ARG *a;
> 
>     a = (ALL_ARG *)arg;
>     
>     if (a->remove_size == REMOVE_MAX) {
>         /* don't compare if array is full */
>         return;
>     }
> 
>     if(t->last_acc < a->lim) {
>         a->remove[a->remove_size++] = t;
>     }
128c166
< t_expire(LHASH *const tab, const time_t lim)
---
> t_expire(SERVICE *svc)
131d168
<     int down_load;
133c170,180
<     a.tab = tab;
---
>     /* create the array which stores pointers to nodes to be removed */
>     if ((a.remove=malloc(sizeof(TABNODE*)*REMOVE_MAX)) == NULL) { // FIXME:
stack or heap?
>         logmsg(LOG_WARNING, "malloc()"); 
>         return;
>     }
>     /* the number of nodes in the array */
>     a.remove_size = 0;
> 
>     time_t lim = time(NULL) - svc->sess_ttl;
> 
>     a.tab = svc->sessions;
135,138c182,210
<     down_load = tab->down_load;
<     tab->down_load = 0;
<     lh_doall_arg(tab, LHASH_DOALL_ARG_FN(t_old), &a);
<     tab->down_load = down_load;
---
>     
>     int ret_val;
>     if(ret_val = pthread_rwlock_rdlock(&svc->lock)) {
>         logmsg(LOG_WARNING, "t_expire() lock: %s", strerror(ret_val));
>         return;
>     }
>     
>     lh_doall_arg(svc->sessions, LHASH_DOALL_ARG_FN(t_old), &a);
>     
>     if(ret_val = pthread_rwlock_unlock(&svc->lock)) {
>         logmsg(LOG_WARNING, "t_expire() lock: %s", strerror(ret_val));
>         return;
>     }
> 
> 
>     if (a.remove_size > 0) {
>         /* writelock */
>         if(ret_val = pthread_rwlock_wrlock(&svc->lock)) {
>             logmsg(LOG_WARNING, "t_expire() lock: %s", strerror(ret_val));
>             return;
>         }
>         /* remove expired elements */
>         t_remove_at_once(&a);
>         /* unlock */
>         if(ret_val = pthread_rwlock_unlock(&svc->lock))
>             logmsg(LOG_WARNING, "t_expire() unlock: %s", strerror(ret_val));
>     }
>     /* free temporary array */
>     free(a.remove);
549a622
>     int         to_add = 0;
551c624
<     if(ret_val = pthread_mutex_lock(&svc->mut))
---
>     if(ret_val = pthread_rwlock_rdlock(&svc->lock))
571c644
<                 t_add(svc->sessions, key, &res, sizeof(res));
---
>                 to_add = 1;
587c660
<                     t_add(svc->sessions, key, &res, sizeof(res));
---
>                     to_add = 1;
606c679
<                     t_add(svc->sessions, key, &res, sizeof(res));
---
>                     to_add = 1;
615c688
<     if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>     if(ret_val = pthread_rwlock_unlock(&svc->lock))
617a691,701
>     /* adding item to the hashtable? */
>     if (1 == to_add) {
>         /* writelock */
>         if(ret_val = pthread_rwlock_wrlock(&svc->lock))
>             logmsg(LOG_WARNING, "get_backend() unlock: %s",
strerror(ret_val));
>         /* add to the table */
>         t_add(svc->sessions, key, &res, sizeof(res));
>         /* unlock */
>         if(ret_val = pthread_rwlock_unlock(&svc->lock))
>             logmsg(LOG_WARNING, "get_backend() unlock: %s",
strerror(ret_val));
>     }
632c716,717
<     if(ret_val = pthread_mutex_lock(&svc->mut))
---
>     /* optain readlock */
>     if(ret_val = pthread_rwlock_rdlock(&svc->lock)) {
634,635c719,731
<     if(get_HEADERS(key, svc, headers))
<         if(t_find(svc->sessions, key) == NULL)
---
>         return;
>     }
>     if(get_HEADERS(key, svc, headers)) {
>         if(t_find(svc->sessions, key) == NULL) {
>             /* release readlock */
>             if(ret_val = pthread_rwlock_unlock(&svc->lock))
>                 logmsg(LOG_WARNING, "upd_session() unlock: %s",
strerror(ret_val));
>             /* optain writelock */
>             if(ret_val = pthread_rwlock_rdlock(&svc->lock)) {
>                 logmsg(LOG_WARNING, "upd_session() lock: %s",
strerror(ret_val));
>                 return;
>             }
>             
637c733,737
<     if(ret_val = pthread_mutex_unlock(&svc->mut))
---
> 
>         }
>     }
>     /* release read- or writelock */
>     if(ret_val = pthread_rwlock_unlock(&svc->lock))
654,655c754,755
< 
<     if(ret_val = pthread_mutex_lock(&svc->mut))
---
>     
>     if(ret_val = pthread_rwlock_wrlock(&svc->lock))
684c784
<     if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>     if(ret_val = pthread_rwlock_unlock(&svc->lock))
698c798
<         if(ret_val = pthread_mutex_lock(&be->mut))
---
>         if(ret_val = pthread_rwlock_rdlock(&be->lock))
707c807
<         if(ret_val = pthread_mutex_unlock(&be->mut))
---
>         if(ret_val = pthread_rwlock_unlock(&be->lock))
1091c1191
<             if(ret_val = pthread_mutex_lock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_wrlock(&svc->lock))
1103c1203
<             if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_unlock(&svc->lock))
1160c1260
<             if(ret_val = pthread_mutex_lock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_wrlock(&svc->lock))
1172c1272
<             if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_unlock(&svc->lock))
1189,1193d1288
<     time_t      cur_time;
<     int         ret_val;
< 
<     /* remove stale sessions */
<     cur_time = time(NULL);
1198,1204c1293
<             if(ret_val = pthread_mutex_lock(&svc->mut)) {
<                 logmsg(LOG_WARNING, "do_expire() lock: %s",
strerror(ret_val));
<                 continue;
<             }
<             t_expire(svc->sessions, cur_time - svc->sess_ttl);
<             if(ret_val = pthread_mutex_unlock(&svc->mut))
<                 logmsg(LOG_WARNING, "do_expire() unlock: %s",
strerror(ret_val));
---
>             t_expire(svc);
1209,1215c1298
<             if(ret_val = pthread_mutex_lock(&svc->mut)) {
<                 logmsg(LOG_WARNING, "do_expire() lock: %s",
strerror(ret_val));
<                 continue;
<             }
<             t_expire(svc->sessions, cur_time - svc->sess_ttl);
<             if(ret_val = pthread_mutex_unlock(&svc->mut))
<                 logmsg(LOG_WARNING, "do_expire() unlock: %s",
strerror(ret_val));
---
>             t_expire(svc);
1244c1327
<             if(ret_val = pthread_mutex_lock(&be->mut))
---
>             if(ret_val = pthread_rwlock_rdlock(&be->lock))
1248c1331
<             if(ret_val = pthread_mutex_unlock(&be->mut))
---
>             if(ret_val = pthread_rwlock_unlock(&be->lock))
1258c1341
<         if(ret_val = pthread_mutex_lock(&svc->mut)) {
---
>         if(ret_val = pthread_rwlock_rdlock(&svc->lock)) { 
1267c1350
<                 if(ret_val = pthread_mutex_lock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_rdlock(&be->lock))
1273c1356
<                 if(ret_val = pthread_mutex_unlock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_unlock(&be->lock))
1279c1362
<                 if(ret_val = pthread_mutex_lock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_rdlock(&be->lock))
1285c1368
<                 if(ret_val = pthread_mutex_unlock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_unlock(&be->lock))
1290c1373
<         if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>         if(ret_val = pthread_rwlock_unlock(&svc->lock))
1302c1385
<             if(ret_val = pthread_mutex_lock(&be->mut))
---
>             if(ret_val = pthread_rwlock_rdlock(&be->lock))
1306c1389
<             if(ret_val = pthread_mutex_unlock(&be->mut))
---
>             if(ret_val = pthread_rwlock_unlock(&be->lock))
1316c1399
<         if(ret_val = pthread_mutex_lock(&svc->mut)) {
---
>         if(ret_val = pthread_rwlock_wrlock(&svc->lock)) {
1325c1408
<                 if(ret_val = pthread_mutex_lock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_rdlock(&be->lock))
1331c1414
<                 if(ret_val = pthread_mutex_unlock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_unlock(&be->lock))
1337c1420
<                 if(ret_val = pthread_mutex_lock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_rdlock(&be->lock))
1343c1426
<                 if(ret_val = pthread_mutex_unlock(&be->mut))
---
>                 if(ret_val = pthread_rwlock_unlock(&be->lock))
1348c1431
<         if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>         if(ret_val = pthread_rwlock_unlock(&svc->lock))
1650c1733
<                     if(dummy = pthread_mutex_lock(&svc->mut))
---
>                     if(dummy = pthread_rwlock_rdlock(&svc->lock))
1654c1737
<                         if(dummy = pthread_mutex_unlock(&svc->mut))
---
>                         if(dummy = pthread_rwlock_unlock(&svc->lock))
1671c1754
<                 if(dummy = pthread_mutex_lock(&svc->mut))
---
>                 if(dummy = pthread_rwlock_rdlock(&svc->lock))
1675c1758
<                     if(dummy = pthread_mutex_unlock(&svc->mut))
---
>                     if(dummy = pthread_rwlock_unlock(&svc->lock))
1735c1818
<             if(ret_val = pthread_mutex_lock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_wrlock(&svc->lock))
1738c1821
<             if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_unlock(&svc->lock))
1746c1829
<             if(ret_val = pthread_mutex_lock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_wrlock(&svc->lock))
1749c1832
<             if(ret_val = pthread_mutex_unlock(&svc->mut))
---
>             if(ret_val = pthread_rwlock_unlock(&svc->lock))



-----Oorspronkelijk bericht-----
Van: Joe Gooch [mailto:mrwizard(at)k12system.com] 
Verzonden: donderdag 23 december 2010 17:41
Aan: pound(at)apsis.ch
Onderwerp: [Pound Mailing List] RE: Website stalls every 60 seconds

I've been known to set the TTL values low (say, 1 minute) and use JMeter or
similar to set up a fleet of zombie web browsers.  It'll give you all the stats
of average response time, responses per second, etc.

As far as pound debugging I usually do the low-tech "add printfs to the code"
approach. :)

Joe


> -----Original Message-----
> From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> Sent: Thursday, December 23, 2010 11:24 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] RE: Website stalls every 60 seconds
> 
> Since this problem is of somewhat a priority to us, I started to
> implement the delete-at-once functionality. It works, but I still need
> to implement read/write locks because during the execution of the
> doall() method we can't afford an insert.
> 
> Do you guys have a good testing method?
> 
> I'm looking forward to your opinions about this solution. If I'm not
> responding to email the next few days, that's because I'm busy cooking
> and celebrating Christmas :-)
> 
> Steven

--
To unsubscribe send an email with subject unsubscribe to pound(at)apsis.ch.
Please contact roseg(at)apsis.ch for questions.

RE: Website stalls every 60 seconds
Joe Gooch <mrwizard(at)k12system.com>
2010-12-31 04:31:36 [ SNIP ]
It sounds good... I haven't looked at the locking or anything.. But.

"ab" is not going to introduce cookies/session affinity.  I bet if you look at
poundctl, you'll see no sessions are being created, or deleted.  -C would allow
you to add a cookie but really you want a cookie with a different value every
request...  When I did testing with JMeter, you can add a cookie jar to the
testing connections... as long as the backend set the cookie it would remember
for that request only... so you could do concurrent testing.  Might be a better
option than ab.

For the purpose of benchmarking, you want to know:
1) Time for a request for a  new session (i.e. lh_insert, write lock, could be
influenced by deletion)
2) Time for a request for an existing session (lh_find, read lock)

It might not be bad to time how long it takes to run the t_expire loop...  both
the read-only find, and the read-write dump.

Joe

> -----Original Message-----
> From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> Sent: Thursday, December 30, 2010 11:18 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] RE: Website stalls every 60 seconds
>
> Today I finished rewriting the hashtable locking logic. I've added the
> diff below.
> Still a few consideration to make:
> For now I use an array of a fixed size which is used to store the
> elements to be deleted.
> This array is filled (with a read lock) through the do_all lhash
> function which calls the t_old function.
> If the array is full the do_all method can't be stopped and thus t_old
> just returns. After do_all is finished, the items are deleted (with a
> write lock).
> There are a few things that come to mind with this solution. The size
> of the array is fixed. What if every minute more items are added than
> deleted? I suggest that we make this size variable. Either in the
> config or calculate it.
>
> Also I found a bug in the Makefile. I edited the pound.h file but the
> poundctl binairy didn't rebuild. I think there's something wrong with
> the dependencies.
>
> Here are two benchmarks. The table is cleaned every 5 seconds. The size
> of the array is 512 (this may need some tweaking less can be better).
>
> Configfile:
>
> ListenHTTP
>         Address Address xxx.xxx.xxx.186
>         Port 88
>         Service
>                 BackEnd
>                         Address xxx.xxx.xxx.186
>                         Port 40081
>                 End
>
>                 BackEnd
>                         Address xxx.xxx.xxx.186
>                         Port 40081
>                 End
>
>                 Session
>                         Type Cookie
>                         ID "pound"
>                         TTL 100
>                 End
>         End
> End
>
> Version 2.6b:
> # ab -n 20000 -c 20 http://xxx.xxx.xxx.xxx/pound.mhtml
>
> Document Path:          /pound.mhtml
> Document Length:        87 bytes
>
> Concurrency Level:      20
> Time taken for tests:   35.784378 seconds
> Complete requests:      20000
> Failed requests:        0
> Write errors:           0
> Total transferred:      5100255 bytes
> HTML transferred:       1740087 bytes
> Requests per second:    558.90 [#/sec] (mean)
> Time per request:       35.784 [ms] (mean)
> Time per request:       1.789 [ms] (mean, across all concurrent
> requests)
> Transfer rate:          139.17 [Kbytes/sec] received
>
> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:        0    4  87.4      2    3001
> Processing:     1   30  16.7     32     309
> Waiting:        1   16  10.8     16     308
> Total:          1   35  89.1     34    3050
>
> Percentage of the requests served within a certain time (ms)
>   50%     34
>   66%     39
>   75%     42
>   80%     45
>   90%     51
>   95%     56
>   98%     62
>   99%     67
>  100%   3050 (longest request)
>
>
> Modified version:
>
> Document Path:          /pound.mhtml
> Document Length:        87 bytes
>
> Concurrency Level:      20
> Time taken for tests:   36.715796 seconds
> Complete requests:      20000
> Failed requests:        0
> Write errors:           0
> Total transferred:      5100255 bytes
> HTML transferred:       1740087 bytes
> Requests per second:    544.72 [#/sec] (mean)
> Time per request:       36.716 [ms] (mean)
> Time per request:       1.836 [ms] (mean, across all concurrent
> requests)
> Transfer rate:          135.64 [Kbytes/sec] received
>
> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:        0    3  87.4      1    3002
> Processing:     1   32  25.1     34     775
> Waiting:        1   15  19.4     13     773
> Total:          1   36  90.8     35    3059
>
> Percentage of the requests served within a certain time (ms)
>   50%     35
>   66%     41
>   75%     45
>   80%     47
>   90%     55
>   95%     63
>   98%     74
>   99%     84
>  100%   3059 (longest request)
>
> Almost no difference. (a little disapointed, I hoped mine version was
> faster ;-) )
>
> Please take a look at the diff. Depending on your commends we'll try it
> on the original (slower) hardware.
>
> The diff:
>
> diff Pound-2.6b/config.c Pound-2.6b-new/config.c
> 197c197,198
> <     pthread_mutex_init(&res->mut, NULL);
> ---
> >     // TODO: sets some attributes to this lock to regulate
> readers/writers priority
> >     pthread_rwlock_init(&res->lock, NULL);
> 462c463,464
> <     pthread_mutex_init(&res->mut, NULL);
> ---
> >     // TODO: sets some attributes to this lock to regulate
> readers/writers priority
> >     pthread_rwlock_init(&res->lock, NULL);
> 535c537,538
> <             pthread_mutex_init(& be->mut, NULL);
> ---
> >             // TODO: initialize with params which regulates rw-
> priorities
> >             pthread_rwlock_init(&be->lock, NULL);
> 561c564,565
> <             pthread_mutex_init(& be->mut, NULL);
> ---
> >             // TODO: initialize with params which regulates rw-
> priorities
> >             pthread_rwlock_init(&be->lock, NULL);
> diff Pound-2.6b/pound.h Pound-2.6b-new/pound.h
> 282a283,285
> > /* the size of the array which stores the items to be deleted*/
> > #define REMOVE_MAX 512
> >
> 311c314
> <     pthread_mutex_t     mut;        /* mutex for this back-end */
> ---
> >     pthread_rwlock_t    lock;       /* rw-lock for this back-end */
> 342c345
> <     pthread_mutex_t     mut;        /* mutex for this service */
> ---
> >     pthread_rwlock_t    lock;       /* rw-lock for this service */
> diff Pound-2.6b/svc.c Pound-2.6b-new/svc.c
> 99a100
> >
> 104a106,108
> >     TABNODE **remove;
> >     int     remove_size;
> >     pthread_rwlock_t *lock;
> 106a111,115
> >
> > /*
> >  * Remove all expired sessions at one since this locks the table much
> shorter
> >  * Requires a readlock on the hashtable.
> >  */
> 108c117
> < t_old(TABNODE *t, void *arg)
> ---
> > t_remove_at_once(ALL_ARG *a)
> 110c119,121
> <     ALL_ARG *a;
> ---
> >     int down_load =>tab->down_load;
> >     a->tab->down_load =
> >
> 113,115c124,126
> <     a =LL_ARG *)arg;
> <     if(t->last_acc < a->lim)
> <         if((res =_delete(a->tab, t)) !=LL) {
> ---
> >     int i =
> >     for (;i < a->remove_size;++i) {
> >         if((res =_delete(a->tab, a->remove[i])) !=LL) {
> 119a131,157
> >     }
> >
> >     a->tab->down_load =wn_load;
> >     a->remove_size =
> >
> >     return;
> > }
> >
> > /*
> >  * Check if a node is expired. If so add to the delete list.
> >  * If this list if full, delete the items in the list.
> >  */
> > static void
> > t_old(TABNODE *t, void *arg)
> > {
> >     ALL_ARG *a;
> >
> >     a =LL_ARG *)arg;
> >
> >     if (a->remove_size =EMOVE_MAX) {
> >         /* don't compare if array is full */
> >         return;
> >     }
> >
> >     if(t->last_acc < a->lim) {
> >         a->remove[a->remove_size++] =
> >     }
> 128c166
> < t_expire(LHASH *const tab, const time_t lim)
> ---
> > t_expire(SERVICE *svc)
> 131d168
> <     int down_load;
> 133c170,180
> <     a.tab =b;
> ---
> >     /* create the array which stores pointers to nodes to be removed
> */
> >     if ((a.remove=lloc(sizeof(TABNODE*)*REMOVE_MAX)) =ULL) { //
> FIXME: stack or heap?
> >         logmsg(LOG_WARNING, "malloc()");
> >         return;
> >     }
> >     /* the number of nodes in the array */
> >     a.remove_size =
> >
> >     time_t lim =me(NULL) - svc->sess_ttl;
> >
> >     a.tab =c->sessions;
> 135,138c182,210
> <     down_load =b->down_load;
> <     tab->down_load =
> <     lh_doall_arg(tab, LHASH_DOALL_ARG_FN(t_old), &a);
> <     tab->down_load =wn_load;
> ---
> >
> >     int ret_val;
> >     if(ret_val =hread_rwlock_rdlock(&svc->lock)) {
> >         logmsg(LOG_WARNING, "t_expire() lock: %s",
> strerror(ret_val));
> >         return;
> >     }
> >
> >     lh_doall_arg(svc->sessions, LHASH_DOALL_ARG_FN(t_old), &a);
> >
> >     if(ret_val =hread_rwlock_unlock(&svc->lock)) {
> >         logmsg(LOG_WARNING, "t_expire() lock: %s",
> strerror(ret_val));
> >         return;
> >     }
> >
> >
> >     if (a.remove_size > 0) {
> >         /* writelock */
> >         if(ret_val =hread_rwlock_wrlock(&svc->lock)) {
> >             logmsg(LOG_WARNING, "t_expire() lock: %s",
> strerror(ret_val));
> >             return;
> >         }
> >         /* remove expired elements */
> >         t_remove_at_once(&a);
> >         /* unlock */
> >         if(ret_val =hread_rwlock_unlock(&svc->lock))
> >             logmsg(LOG_WARNING, "t_expire() unlock: %s",
> strerror(ret_val));
> >     }
> >     /* free temporary array */
> >     free(a.remove);
> 549a622
> >     int         to_add =
> 551c624
> <     if(ret_val =hread_mutex_lock(&svc->mut))
> ---
> >     if(ret_val =hread_rwlock_rdlock(&svc->lock))
> 571c644
> <                 t_add(svc->sessions, key, &res, sizeof(res));
> ---
> >                 to_add =
> 587c660
> <                     t_add(svc->sessions, key, &res, sizeof(res));
> ---
> >                     to_add =
> 606c679
> <                     t_add(svc->sessions, key, &res, sizeof(res));
> ---
> >                     to_add =
> 615c688
> <     if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >     if(ret_val =hread_rwlock_unlock(&svc->lock))
> 617a691,701
> >     /* adding item to the hashtable? */
> >     if (1 =o_add) {
> >         /* writelock */
> >         if(ret_val =hread_rwlock_wrlock(&svc->lock))
> >             logmsg(LOG_WARNING, "get_backend() unlock: %s",
> strerror(ret_val));
> >         /* add to the table */
> >         t_add(svc->sessions, key, &res, sizeof(res));
> >         /* unlock */
> >         if(ret_val =hread_rwlock_unlock(&svc->lock))
> >             logmsg(LOG_WARNING, "get_backend() unlock: %s",
> strerror(ret_val));
> >     }
> 632c716,717
> <     if(ret_val =hread_mutex_lock(&svc->mut))
> ---
> >     /* optain readlock */
> >     if(ret_val =hread_rwlock_rdlock(&svc->lock)) {
> 634,635c719,731
> <     if(get_HEADERS(key, svc, headers))
> <         if(t_find(svc->sessions, key) =ULL)
> ---
> >         return;
> >     }
> >     if(get_HEADERS(key, svc, headers)) {
> >         if(t_find(svc->sessions, key) =ULL) {
> >             /* release readlock */
> >             if(ret_val =hread_rwlock_unlock(&svc->lock))
> >                 logmsg(LOG_WARNING, "upd_session() unlock: %s",
> strerror(ret_val));
> >             /* optain writelock */
> >             if(ret_val =hread_rwlock_rdlock(&svc->lock)) {
> >                 logmsg(LOG_WARNING, "upd_session() lock: %s",
> strerror(ret_val));
> >                 return;
> >             }
> >
> 637c733,737
> <     if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >
> >         }
> >     }
> >     /* release read- or writelock */
> >     if(ret_val =hread_rwlock_unlock(&svc->lock))
> 654,655c754,755
> <
> <     if(ret_val =hread_mutex_lock(&svc->mut))
> ---
> >
> >     if(ret_val =hread_rwlock_wrlock(&svc->lock))
> 684c784
> <     if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >     if(ret_val =hread_rwlock_unlock(&svc->lock))
> 698c798
> <         if(ret_val =hread_mutex_lock(&be->mut))
> ---
> >         if(ret_val =hread_rwlock_rdlock(&be->lock))
> 707c807
> <         if(ret_val =hread_mutex_unlock(&be->mut))
> ---
> >         if(ret_val =hread_rwlock_unlock(&be->lock))
> 1091c1191
> <             if(ret_val =hread_mutex_lock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_wrlock(&svc->lock))
> 1103c1203
> <             if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_unlock(&svc->lock))
> 1160c1260
> <             if(ret_val =hread_mutex_lock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_wrlock(&svc->lock))
> 1172c1272
> <             if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_unlock(&svc->lock))
> 1189,1193d1288
> <     time_t      cur_time;
> <     int         ret_val;
> <
> <     /* remove stale sessions */
> <     cur_time =me(NULL);
> 1198,1204c1293
> <             if(ret_val =hread_mutex_lock(&svc->mut)) {
> <                 logmsg(LOG_WARNING, "do_expire() lock: %s",
> strerror(ret_val));
> <                 continue;
> <             }
> <             t_expire(svc->sessions, cur_time - svc->sess_ttl);
> <             if(ret_val =hread_mutex_unlock(&svc->mut))
> <                 logmsg(LOG_WARNING, "do_expire() unlock: %s",
> strerror(ret_val));
> ---
> >             t_expire(svc);
> 1209,1215c1298
> <             if(ret_val =hread_mutex_lock(&svc->mut)) {
> <                 logmsg(LOG_WARNING, "do_expire() lock: %s",
> strerror(ret_val));
> <                 continue;
> <             }
> <             t_expire(svc->sessions, cur_time - svc->sess_ttl);
> <             if(ret_val =hread_mutex_unlock(&svc->mut))
> <                 logmsg(LOG_WARNING, "do_expire() unlock: %s",
> strerror(ret_val));
> ---
> >             t_expire(svc);
> 1244c1327
> <             if(ret_val =hread_mutex_lock(&be->mut))
> ---
> >             if(ret_val =hread_rwlock_rdlock(&be->lock))
> 1248c1331
> <             if(ret_val =hread_mutex_unlock(&be->mut))
> ---
> >             if(ret_val =hread_rwlock_unlock(&be->lock))
> 1258c1341
> <         if(ret_val =hread_mutex_lock(&svc->mut)) {
> ---
> >         if(ret_val =hread_rwlock_rdlock(&svc->lock)) {
> 1267c1350
> <                 if(ret_val =hread_mutex_lock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_rdlock(&be->lock))
> 1273c1356
> <                 if(ret_val =hread_mutex_unlock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_unlock(&be->lock))
> 1279c1362
> <                 if(ret_val =hread_mutex_lock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_rdlock(&be->lock))
> 1285c1368
> <                 if(ret_val =hread_mutex_unlock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_unlock(&be->lock))
> 1290c1373
> <         if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >         if(ret_val =hread_rwlock_unlock(&svc->lock))
> 1302c1385
> <             if(ret_val =hread_mutex_lock(&be->mut))
> ---
> >             if(ret_val =hread_rwlock_rdlock(&be->lock))
> 1306c1389
> <             if(ret_val =hread_mutex_unlock(&be->mut))
> ---
> >             if(ret_val =hread_rwlock_unlock(&be->lock))
> 1316c1399
> <         if(ret_val =hread_mutex_lock(&svc->mut)) {
> ---
> >         if(ret_val =hread_rwlock_wrlock(&svc->lock)) {
> 1325c1408
> <                 if(ret_val =hread_mutex_lock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_rdlock(&be->lock))
> 1331c1414
> <                 if(ret_val =hread_mutex_unlock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_unlock(&be->lock))
> 1337c1420
> <                 if(ret_val =hread_mutex_lock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_rdlock(&be->lock))
> 1343c1426
> <                 if(ret_val =hread_mutex_unlock(&be->mut))
> ---
> >                 if(ret_val =hread_rwlock_unlock(&be->lock))
> 1348c1431
> <         if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >         if(ret_val =hread_rwlock_unlock(&svc->lock))
> 1650c1733
> <                     if(dummy =hread_mutex_lock(&svc->mut))
> ---
> >                     if(dummy =hread_rwlock_rdlock(&svc->lock))
> 1654c1737
> <                         if(dummy =hread_mutex_unlock(&svc->mut))
> ---
> >                         if(dummy =hread_rwlock_unlock(&svc->lock))
> 1671c1754
> <                 if(dummy =hread_mutex_lock(&svc->mut))
> ---
> >                 if(dummy =hread_rwlock_rdlock(&svc->lock))
> 1675c1758
> <                     if(dummy =hread_mutex_unlock(&svc->mut))
> ---
> >                     if(dummy =hread_rwlock_unlock(&svc->lock))
> 1735c1818
> <             if(ret_val =hread_mutex_lock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_wrlock(&svc->lock))
> 1738c1821
> <             if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_unlock(&svc->lock))
> 1746c1829
> <             if(ret_val =hread_mutex_lock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_wrlock(&svc->lock))
> 1749c1832
> <             if(ret_val =hread_mutex_unlock(&svc->mut))
> ---
> >             if(ret_val =hread_rwlock_unlock(&svc->lock))
>
>
>
> -----Oorspronkelijk bericht-----
> Van: Joe Gooch [mailto:mrwizard(at)k12system.com]
> Verzonden: donderdag 23 december 2010 17:41
> Aan: pound(at)apsis.ch
> Onderwerp: [Pound Mailing List] RE: Website stalls every 60 seconds
>
> I've been known to set the TTL values low (say, 1 minute) and use
> JMeter or similar to set up a fleet of zombie web browsers.  It'll give
> you all the stats of average response time, responses per second, etc.
>
> As far as pound debugging I usually do the low-tech "add printfs to the
> code" approach. :)
>
> Joe
>
>
> > -----Original Message-----
> > From: Steven van der Vegt [mailto:steven(at)echelon.nl]
> > Sent: Thursday, December 23, 2010 11:24 AM
> > To: pound(at)apsis.ch
> > Subject: [Pound Mailing List] RE: Website stalls every 60 seconds
> >
> > Since this problem is of somewhat a priority to us, I started to
> > implement the delete-at-once functionality. It works, but I still
> need
> > to implement read/write locks because during the execution of the
> > doall() method we can't afford an insert.
> >
> > Do you guys have a good testing method?
> >
> > I'm looking forward to your opinions about this solution. If I'm not
> > responding to email the next few days, that's because I'm busy
> cooking
> > and celebrating Christmas :-)
> >
> > Steven
>
> --
> To unsubscribe send an email with subject unsubscribe to
> pound(at)apsis.ch.
> Please contact roseg(at)apsis.ch for questions.
>
> --
> To unsubscribe send an email with subject unsubscribe to
> pound(at)apsis.ch.
> Please contact roseg(at)apsis.ch for questions.

MailBoxer