/ Zope / Apsis / Pound Mailing List / Archive / 2005 / 2005-07 / Large file time-out

[ << ] [ >> ]

[ Logging... / Lukasz 'LCF' Jagiello ... ] [ SSL + Pound-1.9 + Firefox-1.0.5+ / Jim Washington ... ]

Large file time-out
Robert Segall <roseg(at)apsis.ch>
2005-07-15 14:01:39 [ SNIP ]
There seems to be a problem in the combination IE - Pound - IIS. Somehow
large files get truncated. We do not use this setup, so I must ask
for the support of people who use this combination in debugging this
problem.

If you use this combination and have some debugging skills please try
investigating the issue and let us know what you come up with. Specific
info would be much appreciated - where, when and under what conditions
does this occur. If you observed the problem with other combinations
that would be important to know too.

Many thanks in advance.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904

Re: [Pound Mailing List] Large file time-out
"webmaster" <webmaster(at)centauremt.com>
2005-07-12 23:37:37 [ SNIP ]
I use apache and large file and seem to suffer tje same
problem.

Thank's

> There seems to be a problem in the combination IE - Pound
> - IIS. Somehow large files get truncated. We do not use
> this setup, so I must ask for the support of people who
> use this combination in debugging this problem.
> 
> If you use this combination and have some debugging skills
> please try investigating the issue and let us know what
> you come up with. Specific info would be much appreciated
> - where, when and under what conditions does this occur.
> If you observed the problem with other combinations that
> would be important to know too.
> 
> Many thanks in advance.



> -- 
> Robert Segall
> Apsis GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-44-920 4904
> 
> -- 
> To unsubscribe send an email with subject 'unsubscribe' to
> pound(at)apsis.ch. Please contact roseg(at)apsis.ch for
> questions.
>
http://192.168.1.2:8080/Apsis/pound/pound_list/archive/2005/2005-07/1121428899000



Antonio Madeira

http://float-tank.centauremt.com
webmaster(at)centauremt.com

RE: [Pound Mailing List] Large file time-out
"Steven Vlach" <Home_Boy(at)msn.com>
2005-07-17 04:26:14 [ SNIP ]
As you know from my previous posts, I am suffering this problem as well. 

To summarize the setup and symptoms:  IE, Pound, IIS.

IIS serves up a page with Content-Disposition: Attachment, IE pops up a
dialog box to save file to somewhere.  At that time, only a partial download
of the file to IE has occurred so far (maybe 70-150k).  If user does not
respond to the dialog box within the Pound Client Timeout value, then Pound
resets the client connection, and the file is truncated at whatever point it
first received the file.  If the client responds within the Client timeout
value, then the rest of the file is downloaded without issue.

This *seems to only occur when the responses are coded with Content-Lengths.
If the response is coded "Chunked Transfer", then the timeout/corruption
issue does not seem to occur, Pound waits patiently for the user to pick the
save-to location.

I understand why Pound does this (I think), but perhaps Pound needs to be
smarter to "wait" for the browser longer when IIS is in the middle of a file
transfer.  Maybe a difference timeout value when the header Content
Disposition is Attachment?

Those are my observations in having Pound "corrupt" the file transfer, it
isn't really corrupting, but the severed connection makes it seems so.

Steven



> -----Original Message-----
> From: Robert Segall [mailto:roseg(at)apsis.ch]
> Sent: Friday, July 15, 2005 7:02 AM
> To: pound(at)apsis.ch
> Subject: [Pound Mailing List] Large file time-out
> 
> There seems to be a problem in the combination IE - Pound - IIS. Somehow
> large files get truncated. We do not use this setup, so I must ask
> for the support of people who use this combination in debugging this
> problem.
> 
> If you use this combination and have some debugging skills please try
> investigating the issue and let us know what you come up with. Specific
> info would be much appreciated - where, when and under what conditions
> does this occur. If you observed the problem with other combinations
> that would be important to know too.
> 
> Many thanks in advance.
> --
> Robert Segall
> Apsis

 GmbH
> Postfach, Uetikon am See, CH-8707
> Tel: +41-44-920 4904
> 
> --
> To unsubscribe send an email with subject 'unsubscribe' to pound(at)apsis.ch.
> Please contact roseg(at)apsis.ch for questions.
> http://192.168.1.2:8080/Apsis/pound/pound_list/archive/2005/2005-
> 07/1121428899000

Re: [Pound Mailing List] Large file time-out
Ed R Zahurak <ezahurak(at)atlanticbb.net>
2005-07-17 06:01:04 [ SNIP ]
Robert,

It's not just IE.  I've seen the same behaviour with Netscape (7ish), 
too, but to a lesser degree -- only when NS is farming off a .pdf file 
to Acrobat Reader to handle, and acrobat reader isn't already in memory.

It's also not exclusively a problem with stuff sent as attachments. 
I've not tested specifically with a content-disposition of inline, but 
if *no* disposition is set, it happens too.  In my particular case, 
image and/or document files are sent to requesting clients via a cgi 
script.  (ColdFusion MX page, actually.)  All that's specified is the 
content-type and content-length.  At a time when all I was sending was 
the type and not the content-length, the issue occured many times more 
frequently.  (I should note that this happens *only* with IE, and not 
with NS or another browser.)  Specifying the content length, along with 
*HUGE* Client and Server values in pound.conf (Server 500, Client 90) 
seem to have solved the problem or, at least, greatly reduced the 
probability of the problem happening.  Additionally, when files were 
served from the web server itself, and *not* from a cgi being run by the 
web server, the files are served up correctly. (In addition, on the IIS 
side of things, keep-alives are disabled, the server is limited to 250 
connections at a time, and has a connection timeout of 180 seconds.  The 
performance tuning is set at less than 100,000 pages per day.  No custom 
HTTP headers are being sent, nor is any content expiration configured.

The first situation I mention would seem to indicate a timeout issue, 
and would be fairly consistent with what Steven is seeing.

The second, however, does not.

I'm still pretty much of the mind that it's a problem caused by the 
server finishing it's write to pound *faster* than pound writes to the 
client, at which pound improperly kills the client-side connection 
before sending all the data through to the client -- possibly 
when/because it doesn't know for certain what the length of the content 
should be.  Although, now that I've compiled just about *everything* I 
can remember about the problem that I know of firsthand into one post, 
there's a couple things in there that wouldn't be indicative of this 
situation; I can't explain them.  Still, this is my best hunch, so I'm 
stickin' with it. :)

If I get some time, I'll try compiling pound with a LOT of 
printf-debugging in it and see what I can see.  I'll keep you posted.

Ed Z.

---

Steven Vlach wrote:

> As you know from my previous posts, I am suffering this problem as well. 
> 
> To summarize the setup and symptoms:  IE, Pound, IIS.
> 
> IIS serves up a page with Content-Disposition: Attachment, IE pops up a
> dialog box to save file to somewhere.  At that time, only a partial download
> of the file to IE has occurred so far (maybe 70-150k).  If user does not
> respond to the dialog box within the Pound Client Timeout value, then Pound
> resets the client connection, and the file is truncated at whatever point it
> first received the file.  If the client responds within the Client timeout
> value, then the rest of the file is downloaded without issue.
> 
> This *seems to only occur when the responses are coded with Content-Lengths.
> If the response is coded "Chunked Transfer", then the timeout/corruption
> issue does not seem to occur, Pound waits patiently for the user to pick the
> save-to location.
> 
> I understand why Pound does this (I think), but perhaps Pound needs to be
> smarter to "wait" for the browser longer when IIS is in the middle of a file
> transfer.  Maybe a difference timeout value when the header Content
> Disposition is Attachment?
> 
> Those are my observations in having Pound "corrupt" the file transfer, it
> isn't really corrupting, but the severed connection makes it seems so.
> 
> Steven
> 
> 
> 
> 
>>-----Original Message-----
>>From: Robert Segall [mailto:roseg(at)apsis.ch]
>>Sent: Friday, July 15, 2005 7:02 AM
>>To: pound(at)apsis.ch
>>Subject: [Pound Mailing List] Large file time-out
>>
>>There seems to be a problem in the combination IE - Pound - IIS. Somehow
>>large files get truncated. We do not use this setup, so I must ask
>>for the support of people who use this combination in debugging this
>>problem.
>>
>>If you use this combination and have some debugging skills please try
>>investigating the issue and let us know what you come up with. Specific
>>info would be much appreciated - where, when and under what conditions
>>does this occur. If you observed the problem with other combinations
>>that would be important to know too.
>>
>>Many thanks in advance.
>>--
>>Robert Segall
>>Apsis
> 
> 
>  GmbH
> 
>>Postfach, Uetikon am See, CH-8707
>>Tel: +41-44-920 4904
>>
>>--
>>To unsubscribe send an email with subject 'unsubscribe' to pound(at)apsis.ch.
>>Please contact roseg(at)apsis.ch for questions.
>>http://192.168.1.2:8080/Apsis/pound/pound_list/archive/2005/2005-
>>07/1121428899000
> 
> 


Re: [Pound Mailing List] Large file time-out
Markus Kramer <m.kramer(at)synalogic.de>
2005-07-17 13:11:44 [ SNIP ]
Hi,

please let me jump in on that. I reported similar behaviour quite a 
while ago, even so my combination of servers does not involve IIS.
I run Pound in front of two server instances actually running on the 
same machine but on different ports. One is an apache 2.0.X with php 
enabled to handle all the non-static content, while the second backend
to pound is a simple thttpd server serving large downloads to the 
customer of 10 to 50MB in size.
Initially I ran pound with default timeout values, and had quite a 
number of reports about client downloads being to short even so the 
client did not report a broken connection or something like this.
First i was not able to reproduce any of this behaviour even so i did
hundreds of test downloads from an host in the same local subnet, and
even started an self made script using wget to do the downloads.
However then i got aware of the fact that most of the users complaining
were in fact using much slower connections for download, and most users
being hit actually being slow modem dialup users.
So I have setup an external linux host with bandwitdth limiation at 
about modem speed and started the download script from there, and i 
could indeed immediatly reproduce the problem. Unfortunatly the host
is an production environment and does only allow for short debugging
sessions, but i managed to run a tcpdump between limited client and
server showing that it was pound closing down the connection before 
download is complete. After some tip from the list i raised timeout 
values, and indeed that seems to fix these issues even so I am still not 
sure why, because all backends should still have been speedy enough to
respond with new data within default timeout value and the client was
continously ACKing all packets received from pound not even coming close
to a timeout before receiving close request.
So to conclude: The probability for such a timeout seems to rise when
client is much slower than backend connection. I.e. i never was able
to do a single timeout for a long file when being downloaded over a
1 Mbit/s DSL Line while the same file and setup limited to 4KB/s
breaks the connection in 70% of my tests. Raising the Client value
to 900 seems to eliminate these failures for some reasons i do not 
understand yet.
I am certainly willing to  help with debugging but due to the machine
being productive can only do limited testing.

Best regards,

Markus Kramer

Synalogic e.K.





Ed R Zahurak wrote:
> Robert,
> 
> It's not just IE.  I've seen the same behaviour with Netscape (7ish), 
> too, but to a lesser degree -- only when NS is farming off a .pdf file 
> to Acrobat Reader to handle, and acrobat reader isn't already in memory.
> 
> It's also not exclusively a problem with stuff sent as attachments. I've 
> not tested specifically with a content-disposition of inline, but if 
> *no* disposition is set, it happens too.  In my particular case, image 
> and/or document files are sent to requesting clients via a cgi script.  
> (ColdFusion MX page, actually.)  All that's specified is the 
> content-type and content-length.  At a time when all I was sending was 
> the type and not the content-length, the issue occured many times more 
> frequently.  (I should note that this happens *only* with IE, and not 
> with NS or another browser.)  Specifying the content length, along with 
> *HUGE* Client and Server values in pound.conf (Server 500, Client 90) 
> seem to have solved the problem or, at least, greatly reduced the 
> probability of the problem happening.  Additionally, when files were 
> served from the web server itself, and *not* from a cgi being run by the 
> web server, the files are served up correctly. (In addition, on the IIS 
> side of things, keep-alives are disabled, the server is limited to 250 
> connections at a time, and has a connection timeout of 180 seconds.  The 
> performance tuning is set at less than 100,000 pages per day.  No custom 
> HTTP headers are being sent, nor is any content expiration configured.
> 
> The first situation I mention would seem to indicate a timeout issue, 
> and would be fairly consistent with what Steven is seeing.
> 
> The second, however, does not.
> 
> I'm still pretty much of the mind that it's a problem caused by the 
> server finishing it's write to pound *faster* than pound writes to the 
> client, at which pound improperly kills the client-side connection 
> before sending all the data through to the client -- possibly 
> when/because it doesn't know for certain what the length of the content 
> should be.  Although, now that I've compiled just about *everything* I 
> can remember about the problem that I know of firsthand into one post, 
> there's a couple things in there that wouldn't be indicative of this 
> situation; I can't explain them.  Still, this is my best hunch, so I'm 
> stickin' with it. :)
> 
> If I get some time, I'll try compiling pound with a LOT of 
> printf-debugging in it and see what I can see.  I'll keep you posted.
> 
> Ed Z.
> 
> ---
> 
> Steven Vlach wrote:
> 
>> As you know from my previous posts, I am suffering this problem as well.
>> To summarize the setup and symptoms:  IE, Pound, IIS.
>>
>> IIS serves up a page with Content-Disposition: Attachment, IE pops up a
>> dialog box to save file to somewhere.  At that time, only a partial 
>> download
>> of the file to IE has occurred so far (maybe 70-150k).  If user does not
>> respond to the dialog box within the Pound Client Timeout value, then 
>> Pound
>> resets the client connection, and the file is truncated at whatever 
>> point it
>> first received the file.  If the client responds within the Client 
>> timeout
>> value, then the rest of the file is downloaded without issue.
>>
>> This *seems to only occur when the responses are coded with 
>> Content-Lengths.
>> If the response is coded "Chunked Transfer", then the timeout/corruption
>> issue does not seem to occur, Pound waits patiently for the user to 
>> pick the
>> save-to location.
>>
>> I understand why Pound does this (I think), but perhaps Pound needs to be
>> smarter to "wait" for the browser longer when IIS is in the middle of 
>> a file
>> transfer.  Maybe a difference timeout value when the header Content
>> Disposition is Attachment?
>>
>> Those are my observations in having Pound "corrupt" the file transfer, it
>> isn't really corrupting, but the severed connection makes it seems so.
>>
>> Steven
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: Robert Segall [mailto:roseg(at)apsis.ch]
>>> Sent: Friday, July 15, 2005 7:02 AM
>>> To: pound(at)apsis.ch
>>> Subject: [Pound Mailing List] Large file time-out
>>>
>>> There seems to be a problem in the combination IE - Pound - IIS. Somehow
>>> large files get truncated. We do not use this setup, so I must ask
>>> for the support of people who use this combination in debugging this
>>> problem.
>>>
>>> If you use this combination and have some debugging skills please try
>>> investigating the issue and let us know what you come up with. Specific
>>> info would be much appreciated - where, when and under what conditions
>>> does this occur. If you observed the problem with other combinations
>>> that would be important to know too.
>>>
>>> Many thanks in advance.
>>> -- 
>>> Robert Segall
>>> Apsis
>>
>>
>>
>>  GmbH
>>
>>> Postfach, Uetikon am See, CH-8707
>>> Tel: +41-44-920 4904
>>>
>>> -- 
>>> To unsubscribe send an email with subject 'unsubscribe' to 
>>> pound(at)apsis.ch.
>>> Please contact roseg(at)apsis.ch for questions.
>>> http://192.168.1.2:8080/Apsis/pound/pound_list/archive/2005/2005-
>>> 07/1121428899000
>>
>>
>>
> 
> 


Re: [Pound Mailing List] Large file time-out
chris(at)aidworld.org
2005-07-18 11:32:10 [ SNIP ]
Hi all,

[...]
> So to conclude: The probability for such a timeout seems to rise when
> client is much slower than backend connection. I.e. i never was able
> to do a single timeout for a long file when being downloaded over a
> 1 Mbit/s DSL Line while the same file and setup limited to 4KB/s
> breaks the connection in 70% of my tests. Raising the Client value
> to 900 seems to eliminate these failures for some reasons i do not
> understand yet.

Is the timeout an absolute one, e.g. from the start of the connection, or
does it only trigger after a certain amount of idle time? The
documentation is not clear on this point, but if the timeout runs from the
start of the connection, it makes perfect sense that users with slow
connections would experience disconnections when downloading large files.

I have a bandwidth throttler here and can do any reasonable amount of
testing on Pound with large downloads and slow connections.

Cheers, Chris.


Re: [Pound Mailing List] Large file time-out
Robert Segall <roseg(at)apsis.ch>
2005-07-18 13:45:34 [ SNIP ]
On Sun, 17 Jul 2005 00:01:04 -0400 Ed R Zahurak
<ezahurak(at)atlanticbb.net> wrote:

> I'm still pretty much of the mind that it's a problem caused by the 
> server finishing it's write to pound *faster* than pound writes to the
> client, at which pound improperly kills the client-side connection 
> before sending all the data through to the client -- possibly 
> when/because it doesn't know for certain what the length of the
> content should be.

I'm pretty sure this is not the case. Pound will flush its buffers to
the client when the data-stream from the server is finished, regardless
of reason...

> If I get some time, I'll try compiling pound with a LOT of 
> printf-debugging in it and see what I can see.  I'll keep you posted.

Please do - that would be valuable information.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904

Re: [Pound Mailing List] Large file time-out
Robert Segall <roseg(at)apsis.ch>
2005-07-18 13:52:09 [ SNIP ]
On Sun, 17 Jul 2005 13:11:44 +0200 Markus Kramer <m.kramer(at)synalogic.de>
wrote:

> Hi,
> 
> please let me jump in on that. I reported similar behaviour quite a 
> while ago, even so my combination of servers does not involve IIS.
> I run Pound in front of two server instances actually running on the 
> same machine but on different ports. One is an apache 2.0.X with php 
> enabled to handle all the non-static content, while the second backend
> to pound is a simple thttpd server serving large downloads to the 
> customer of 10 to 50MB in size.
> Initially I ran pound with default timeout values, and had quite a 
> number of reports about client downloads being to short even so the 
> client did not report a broken connection or something like this.
> First i was not able to reproduce any of this behaviour even so i did
> hundreds of test downloads from an host in the same local subnet, and
> even started an self made script using wget to do the downloads.
> However then i got aware of the fact that most of the users
> complaining were in fact using much slower connections for download,
> and most users being hit actually being slow modem dialup users.
> So I have setup an external linux host with bandwitdth limiation at 
> about modem speed and started the download script from there, and i 
> could indeed immediatly reproduce the problem. Unfortunatly the host
> is an production environment and does only allow for short debugging
> sessions, but i managed to run a tcpdump between limited client and
> server showing that it was pound closing down the connection before 
> download is complete. After some tip from the list i raised timeout 
> values, and indeed that seems to fix these issues even so I am still
> not sure why, because all backends should still have been speedy
> enough to respond with new data within default timeout value and the
> client was continously ACKing all packets received from pound not even
> coming close to a timeout before receiving close request.
> So to conclude: The probability for such a timeout seems to rise when
> client is much slower than backend connection. I.e. i never was able
> to do a single timeout for a long file when being downloaded over a
> 1 Mbit/s DSL Line while the same file and setup limited to 4KB/s
> breaks the connection in 70% of my tests. Raising the Client value
> to 900 seems to eliminate these failures for some reasons i do not 
> understand yet.

Please try to test for this condition after tuning your network
parameters. My primary suspect still is the internal network buffer
size, where I assume the Linux buffers get filled-up and the Windows TCP
stack does not signal correctly the congestion. See tcp_mem and tcp_wmem
in the tcp(7) man page.

By default Linux allows buffering of up to 128K. Try increasing it
massively (2MB for starters) and see if it has any influence on the
outcome.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904

Re: [Pound Mailing List] Large file time-out
Robert Segall <roseg(at)apsis.ch>
2005-07-18 13:54:20 [ SNIP ]
On Mon, 18 Jul 2005 10:32:10 +0100 (BST) chris(at)aidworld.org wrote:

> Is the timeout an absolute one, e.g. from the start of the connection,
> or does it only trigger after a certain amount of idle time? The
> documentation is not clear on this point, but if the timeout runs from
> the start of the connection, it makes perfect sense that users with
> slow connections would experience disconnections when downloading
> large files.

The time-out is PER ACTION: in other words each read() and write(). The
total transaction time plays no role.

> I have a bandwidth throttler here and can do any reasonable amount of
> testing on Pound with large downloads and slow connections.

That would be very helpful. Have a look at the tcp(7) page for details
on things you cat test (tcp_mem, tcp_wmem).
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904

Re: [Pound Mailing List] Large file time-out
Chris Wilson <chris(at)aidworld.org>
2005-07-18 16:56:54 [ SNIP ]
Hi Robert and all,

> > I have a bandwidth throttler here and can do any reasonable amount of
> > testing on Pound with large downloads and slow connections.
> 
> That would be very helpful. Have a look at the tcp(7) page for details
> on things you cat test (tcp_mem, tcp_wmem).

Here's what I've been able to determine so far:

When Internet Exploder starts to download a file that it doesn't have a
handler for, it prompts you what to do with the file, while downloading
it in the background. 

> 15:37:46.399614 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
44387:45847(1460) ack 345 win 6432
> 15:37:46.399633 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
45847:47307(1460) ack 345 win 6432
> 15:37:46.399826 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
47307:48767(1460) ack 345 win 6432
> 15:37:46.999608 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 30923 win
46700
> 15:37:46.999653 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
48767:50227(1460) ack 345 win 6432
> 15:37:46.999670 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
50227:51687(1460) ack 345 win 6432
> 15:37:46.999683 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
51687:53147(1460) ack 345 win 6432
> 15:37:47.589566 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 33843 win
43780

So far so good. But now it stops downloading the file. This seems to be
after it has downloaded around 50 kB.

> 15:37:47.589620 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
53147:54607(1460) ack 345 win 6432
> 15:37:47.589641 IP bad-len 0

I can't explain the IP bad-len 0, it seems to be a kernel or tcpdump
bug, because the tcpdump BPF filter did catch the packet as being on
port 9000.

> 15:37:48.179538 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 36763 win
40860
> 15:37:48.179590 IP bad-len 0
> 15:37:48.769454 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 38547 win
39076
> 15:37:48.769514 IP bad-len 0
> 15:37:49.139616 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 41467 win
36156
> 15:37:49.139665 IP bad-len 0
> 15:37:49.739524 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 44387 win
33236
> 15:37:49.739582 IP bad-len 0
> 15:37:50.329916 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 47307 win
30316
> 15:37:50.329967 IP bad-len 0
> 15:37:50.919883 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 50227 win
27396
> 15:37:50.919940 IP bad-len 0
> 15:37:51.519847 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 53147 win
24476
> 15:37:52.109811 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 56067 win
21556
> 15:37:52.699779 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 58987 win
18636
> 15:37:53.299742 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 61907 win
15716
> 15:37:53.889529 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 64827 win
12796
> 15:37:54.479522 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 67747 win
9876
> 15:37:55.069653 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 70667 win
6956
> 15:37:55.669626 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 73587 win
4036
> 15:37:56.407548 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 75047 win
2576
> 15:38:07.307253 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
75047:76507(1460) ack 345 win 6432
> 15:38:07.423438 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 76507 win
1116
> 15:38:17.773673 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
76507:77623(1116) ack 345 win 6432
> 15:38:17.938676 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win
0

By now the window size has fallen to 0, and Pound is unable to send any
more data. After the 30 second client timeout is reached, Pound logs the
following:

> Jul 18 15:38:37 www3 pound: error copy server cont: Connection timed out

But it doesn't terminate the connection immediately.

> 15:38:27.693145 IP 217.14.134.34.9000 > 131.111.156.200.1064: . ack 345 win
6432
> 15:38:27.718011 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win
0
> 15:38:47.226172 IP 217.14.134.34.9000 > 131.111.156.200.1064: . ack 345 win
6432
> 15:38:47.233191 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win
0
> 15:39:26.251237 IP 217.14.134.34.9000 > 131.111.156.200.1064: . ack 345 win
6432
> 15:39:26.264828 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win
0
> 15:40:44.300372 IP 217.14.134.34.9000 > 131.111.156.200.1064: . ack 345 win
6432
> 15:40:44.321642 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win
0
> 15:41:42.908664 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win
64240

At 15:41:42 I clicked on the Save button and chose a destination for the
file. Now the window opens again, and Pound can send data again.

> 15:41:42.908715 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
77623:77967(344) ack 345 win 6432
> 15:41:42.908739 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
77967:79427(1460) ack 345 win 6432
> 15:41:42.908752 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
79427:80887(1460) ack 345 win 6432
> 15:41:42.921699 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 79427 win
64240
[...]
> 15:41:48.602292 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
114467:114911(444) ack 345 win 6432
> 15:41:48.602305 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
114911:116371(1460) ack 345 win 6432
> 15:41:48.602317 IP 217.14.134.34.9000 > 131.111.156.200.1064: FP
116371:116959(588) ack 345 win 6432
> 15:41:49.192178 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 113007
win 64240
> 15:41:49.782143 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 114911
win 64240
> 15:41:50.182119 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack 116960
win 64240
> 15:41:50.183116 IP 131.111.156.200.1064 > 217.14.134.34.9000: F 345:345(0)
ack 116960 win 64240
> 15:41:50.183152 IP 217.14.134.34.9000 > 131.111.156.200.1064: . ack 346 win
6432

NOW pound closes the connection. In Internet Explorer, the download
appears complete, but only 114 KB of the 10 MB file were downloaded.

I haven't tried adjusting any /proc settings yet, I suspect they will
have limited value. I think it makes sense to operate with a larger
client timeout in case users are making big downloads over slow
connections, where a few lost packets could cause a very long delay in
transmission.

I hope this helps to explain/understand what's going on. If you want me
to run any more tests, please let me know what would be useful.

Cheers, Chris.
-- 
(aidworld) chris wilson | chief engineer (chris(at)aidworld.org)


Re: [Pound Mailing List] Large file time-out
Robert Segall <roseg(at)apsis.ch>
2005-07-18 18:19:23 [ SNIP ]
On Mon, 18 Jul 2005 15:56:54 +0100 Chris Wilson <chris(at)aidworld.org>
wrote:

> Here's what I've been able to determine so far:
> 
> When Internet Exploder starts to download a file that it doesn't have
> a handler for, it prompts you what to do with the file, while
> downloading it in the background. 
> 
> > 15:37:46.399614 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
> > 44387:45847(1460) ack 345 win 6432 15:37:46.399633 IP
> > 217.14.134.34.9000 > 131.111.156.200.1064: . 45847:47307(1460) ack
> > 345 win 6432 15:37:46.399826 IP 217.14.134.34.9000 >
> > 131.111.156.200.1064: . 47307:48767(1460) ack 345 win 6432
> > 15:37:46.999608 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 30923 win 46700 15:37:46.999653 IP 217.14.134.34.9000 >
> > 131.111.156.200.1064: . 48767:50227(1460) ack 345 win 6432
> > 15:37:46.999670 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
> > 50227:51687(1460) ack 345 win 6432 15:37:46.999683 IP
> > 217.14.134.34.9000 > 131.111.156.200.1064: . 51687:53147(1460) ack
> > 345 win 6432 15:37:47.589566 IP 131.111.156.200.1064 >
> > 217.14.134.34.9000: . ack 33843 win 43780
> 
> So far so good. But now it stops downloading the file. This seems to
> be after it has downloaded around 50 kB.

So far this is normal for most browsers...

> > 15:37:47.589620 IP 217.14.134.34.9000 > 131.111.156.200.1064: .
> > 53147:54607(1460) ack 345 win 6432 15:37:47.589641 IP bad-len 0
> 
> I can't explain the IP bad-len 0, it seems to be a kernel or tcpdump
> bug, because the tcpdump BPF filter did catch the packet as being on
> port 9000.
> 
> > 15:37:48.179538 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 36763 win 40860 15:37:48.179590 IP bad-len 0
> > 15:37:48.769454 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 38547 win 39076 15:37:48.769514 IP bad-len 0
> > 15:37:49.139616 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 41467 win 36156 15:37:49.139665 IP bad-len 0
> > 15:37:49.739524 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 44387 win 33236 15:37:49.739582 IP bad-len 0
> > 15:37:50.329916 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 47307 win 30316 15:37:50.329967 IP bad-len 0
> > 15:37:50.919883 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 50227 win 27396 15:37:50.919940 IP bad-len 0
> > 15:37:51.519847 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 53147 win 24476 15:37:52.109811 IP 131.111.156.200.1064 >
> > 217.14.134.34.9000: . ack 56067 win 21556 15:37:52.699779 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 58987 win 18636
> > 15:37:53.299742 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 61907 win 15716 15:37:53.889529 IP 131.111.156.200.1064 >
> > 217.14.134.34.9000: . ack 64827 win 12796 15:37:54.479522 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 67747 win 9876
> > 15:37:55.069653 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 70667 win 6956 15:37:55.669626 IP 131.111.156.200.1064 >
> > 217.14.134.34.9000: . ack 73587 win 4036 15:37:56.407548 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 75047 win 2576
> > 15:38:07.307253 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
> > 75047:76507(1460) ack 345 win 6432 15:38:07.423438 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 76507 win 1116
> > 15:38:17.773673 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
> > 76507:77623(1116) ack 345 win 6432 15:38:17.938676 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win 0
> 
> By now the window size has fallen to 0, and Pound is unable to send
> any more data. After the 30 second client timeout is reached, Pound
> logs the following:
> 
> > Jul 18 15:38:37 www3 pound: error copy server cont: Connection timed
> > out
> 
> But it doesn't terminate the connection immediately.

Yes it does. Look at the code in http.c, about line 1108.

> > 15:38:27.693145 IP 217.14.134.34.9000 > 131.111.156.200.1064: . ack
> > 345 win 6432 15:38:27.718011 IP 131.111.156.200.1064 >
> > 217.14.134.34.9000: . ack 77623 win 0 15:38:47.226172 IP
> > 217.14.134.34.9000 > 131.111.156.200.1064: . ack 345 win 6432
> > 15:38:47.233191 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 77623 win 0 15:39:26.251237 IP 217.14.134.34.9000 >
> > 131.111.156.200.1064: . ack 345 win 6432 15:39:26.264828 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win 0
> > 15:40:44.300372 IP 217.14.134.34.9000 > 131.111.156.200.1064: . ack
> > 345 win 6432 15:40:44.321642 IP 131.111.156.200.1064 >
> > 217.14.134.34.9000: . ack 77623 win 0 15:41:42.908664 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 77623 win 64240
> 
> At 15:41:42 I clicked on the Save button and chose a destination for
> the file. Now the window opens again, and Pound can send data again.
> 
> > 15:41:42.908715 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
> > 77623:77967(344) ack 345 win 6432 15:41:42.908739 IP
> > 217.14.134.34.9000 > 131.111.156.200.1064: . 77967:79427(1460) ack
> > 345 win 6432 15:41:42.908752 IP 217.14.134.34.9000 >
> > 131.111.156.200.1064: . 79427:80887(1460) ack 345 win 6432
> > 15:41:42.921699 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 79427 win 64240
> [...]
> > 15:41:48.602292 IP 217.14.134.34.9000 > 131.111.156.200.1064: P
> > 114467:114911(444) ack 345 win 6432 15:41:48.602305 IP
> > 217.14.134.34.9000 > 131.111.156.200.1064: . 114911:116371(1460) ack
> > 345 win 6432 15:41:48.602317 IP 217.14.134.34.9000 >
> > 131.111.156.200.1064: FP 116371:116959(588) ack 345 win 6432
> > 15:41:49.192178 IP 131.111.156.200.1064 > 217.14.134.34.9000: . ack
> > 113007 win 64240 15:41:49.782143 IP 131.111.156.200.1064 >
> > 217.14.134.34.9000: . ack 114911 win 64240 15:41:50.182119 IP
> > 131.111.156.200.1064 > 217.14.134.34.9000: . ack 116960 win 64240
> > 15:41:50.183116 IP 131.111.156.200.1064 > 217.14.134.34.9000: F
> > 345:345(0) ack 116960 win 64240 15:41:50.183152 IP
> > 217.14.134.34.9000 > 131.111.156.200.1064: . ack 346 win 6432
> 
> NOW pound closes the connection. In Internet Explorer, the download
> appears complete, but only 114 KB of the 10 MB file were downloaded.

I strongly suspect these packets come from the TCP buffers _after_ Pound
terminated the connection! I guess that if you were to increase your
buffer size (kernel) to 2MB you would see a download of about 2MB rather
than the approx. 128M buffer size. You could also reduce the buffers to
32K and see a smaller download.

You could also try the following changes in http.c:

line 612 change to

	l.l_onoff = 0;

to remove one potential problem area.

Also comment out line 1088-1093 (the BIO_flush code). This will have the
effect of eliminating the extra packet that might be generated.

Let us know.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904

Re: [Pound Mailing List] Large file time-out
Robert Segall <roseg(at)apsis.ch>
2005-07-18 19:04:20 [ SNIP ]
Additional suggestion which _may_ help (or not): add the following code
in http.c, line 619 (after the #endif for TCP_LINGER2)

    n = 1;
    setsockopt(sock, SOL_TCP, TCP_NODELAY, (void *)&n, sizeof(n));

Some people believe this to help - YMMV.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904

Re: [Pound Mailing List] Large file time-out
Chris Wilson <chris(at)aidworld.org>
2005-07-18 19:13:16 [ SNIP ]
Hi Robert,

> > But it doesn't terminate the connection immediately.
> 
> Yes it does. Look at the code in http.c, about line 1108.

I mean that the TCP connection (network state) is not immediately
closed. I agree that it looks like data is being buffered by the OS.

> I strongly suspect these packets come from the TCP buffers _after_ Pound
> terminated the connection! I guess that if you were to increase your
> buffer size (kernel) to 2MB you would see a download of about 2MB rather
> than the approx. 128M buffer size. You could also reduce the buffers to
> 32K and see a smaller download.
> 
> You could also try the following changes in http.c:
> 
> line 612 change to
> 
> 	l.l_onoff = 0;
> 
> to remove one potential problem area.
> 
> Also comment out line 1088-1093 (the BIO_flush code). This will have the
> effect of eliminating the extra packet that might be generated.

Does it matter? I think we have a convincing reason why you should not
use a short client timeout :-) That should be enough for most people.
Everything else is just playing around with the buffer size and the
location of the buffering.

What I don't understand is why Internet Exploder doesn't detect that the
connection was closed before the content-length was reached, and either
reconnect or tell the user something helpful. It just blithely pretends
that the download was successful. Then again, perhaps I shouldn't expect
more from Mickeysoft.

Cheers, Chris.
-- 
(aidworld) chris wilson | chief engineer (chris(at)aidworld.org)


Re: [Pound Mailing List] Large file time-out
Robert Segall <roseg(at)apsis.ch>
2005-07-19 14:51:02 [ SNIP ]
On Mon, 18 Jul 2005 18:13:16 +0100 Chris Wilson <chris(at)aidworld.org>
wrote:

> Hi Robert,
> 
> > > But it doesn't terminate the connection immediately.
> > 
> > Yes it does. Look at the code in http.c, about line 1108.
> 
> I mean that the TCP connection (network state) is not immediately
> closed. I agree that it looks like data is being buffered by the OS.

"It looks" != "is proved". Some testing may be in order, just to be on
the safe side.

> > I strongly suspect these packets come from the TCP buffers _after_
> > Pound terminated the connection! I guess that if you were to
> > increase your buffer size (kernel) to 2MB you would see a download
> > of about 2MB rather than the approx. 128M buffer size. You could
> > also reduce the buffers to 32K and see a smaller download.
> > 
> > You could also try the following changes in http.c:
> > 
> > line 612 change to
> > 
> > 	l.l_onoff = 0;
> > 
> > to remove one potential problem area.
> > 
> > Also comment out line 1088-1093 (the BIO_flush code). This will have
> > the effect of eliminating the extra packet that might be generated.
> 
> Does it matter? I think we have a convincing reason why you should not
> use a short client timeout :-) That should be enough for most people.
> Everything else is just playing around with the buffer size and the
> location of the buffering.

There may well be caching issues involved here as well - I feel we just
don't know enough to dismiss this out of hand.

> What I don't understand is why Internet Exploder doesn't detect that
> the connection was closed before the content-length was reached, and
> either reconnect or tell the user something helpful. It just blithely
> pretends that the download was successful. Then again, perhaps I
> shouldn't expect more from Mickeysoft.

Just as an example:

1. IE pops up the dialog box, and in parallel downloads a bit of the
file. This gets cached by the browser.

2. The user responds after the timeout.

3. Pound has closed the socket due to the timeout.

4. IE looks at its cache and sees it is not the size announced by the
headers. It tries to get the rest from the server and fails.
-- 
Robert Segall
Apsis GmbH
Postfach, Uetikon am See, CH-8707
Tel: +41-44-920 4904

MailBoxer