[nylug-talk] tcp TIME_WAITs
Johnn
Wed Sep 20 11:37:49 EDT 2006
On Wed, 2006-09-20 at 10:38 -0400, Mordy Ovits wrote:
> On Wednesday 20 September 2006 01:45 am, Johnn Tan wrote:
> > in fact, under heavy load, we would reach about
> > 27000 TIME_WAITs, then there would be no more. I'm guessing it's
> > related to net.ipv4.ip_local_port_range being
> > 32768 61000
>
> Why would it be related to that setting? TCP connections are
> identified by a tuple, not just the source port.
I guess I was trying to figure out some reason for why TIME_WAITs max
out around 27000-28000 ... and hey, it was late at night :).
> Personally, I think it's just a change in behavior between 2.4 and
> 2.6. You're serving up very many shortlived TCP connections, so lots
> of TIME_WAITs isn't unexpected. Unless you know it to be causing
> problems, you should probably just ignore it.
I guess I was jumping the gun a bit. I looked into all of this due to
some problems, and what I ended up presenting was the end result of my
looking into it, rather than the original problems.
The initial prompting was that we were seeing less impressions on the
new boxes compared to the old ones across the board. (The highest number
of impressions on any new box is still much lower than the lowest number
on any old box. On average, the difference is about 10%.)
I'm thinking some or most of this is due to DNS propagation. All
machines (new and old) are in the same DNS rotation, but perhaps there
are still lingering DNS caches out there serving out only the old boxes'
IPs. (We will eventually move the old boxes out from the DNS rotation.)
But in looking further into the problem, I also noticed these types of
entries in the apache error_log on the new boxes:
[Tue Sep 12 19:03:04 2006] [info] [client 64.122.12.124] (104)Connection
reset by peer: core_output_filter: writing data to the network
[Tue Sep 12 19:06:06 2006] [info] [client 65.244.245.5] (104)Connection
reset by peer: core_output_filter: writing data to the network
[Tue Sep 12 19:06:34 2006] [info] [client 64.91.81.34] (104)Connection
reset by peer: core_output_filter: writing data to the network
[Tue Sep 12 19:08:21 2006] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 8 children,
there are 1982 idle, and 2604 total children
[Tue Sep 12 19:08:22 2006] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 16 children,
there are 1992 idle, and 2612 total children
I realize that some "connection reset by peer" errors are simply
client-side cancellations, but I am seeing tons of these, so that can't
explain them all.
So, in looking even further into this, I began to notice the TIME_WAITs,
etc. -- basically what I presented in my last email -- and I guess I
perhaps drew far too many conclusions as I got further down the
rabbithole, so to speak.
What I was mainly seeking was to account for the 10% difference in
impressions, and secondarily, to reduce those type of error entries
listed above in the logs.
To this end, ideally, I want to keep the entire client/server
transaction to its bare/quickest minimum. On the apache end, I assume
it's best to turn off KeepAlive to accomplish this.
On the tcp side, is there anything I can do toward this goal? I thought
TIME_WAITs were a problem, but if they aren't, that's cool. But is there
anything else that can be tweaked, or is it best to leave this side
alone?
johnn
More information about the nylug-talk
mailing list