TCP_NOPUSH and Mac OS X
Gregory Block
gblock at ctoforaday.com
Mon Mar 7 01:13:03 PST 2005
I had a bug open on Radar regarding memcached causing core dumps when
used in poll(), kevent(), or anything other than select. I've been
informed by the team that there's a fix in place for Tiger, and that
they've tested under tiger and poll() without seeing the kernel panics.
So...
Until that fix is in place, it's memcached + select() on mac os x, or
watch your kernel go tits up.
On 5 Mar 2005, at 20:20, Richard Cameron wrote:
>
> There was some discussion on this list last year about some fairly
> serious performance problems on Mac OS X. I was seeing these too, and
> I think I've isolated the problem to the TCP_NOPUSH option, and
> there's a one line hack which seems to solve it.
>
> On OS X 10.3.8, running memcached locally and connecting to it on
> localhost, the symptoms were that there was a latency of about 0.2
> seconds between sending a command down the socket to the server and
> getting a reply. Doing a tcpdump showed that the delay was *exactly*
> 200ms on every request, however running a kdump showed that memcached
> was actually writing its response to the socket pretty much
> instantaneously.
>
> The relevant hack which seemed to get things working again was to
> simply comment out the line in memcached.c which set TCP_NOPUSH:
>
> #ifdef TCP_NOPUSH
> // setsockopt(c->sfd, IPPROTO_TCP, TCP_NOPUSH, &val, sizeof(val));
> #endif
>
> It doesn't seem to be well known (at least, Google doesn't know) that
> TCP_NOPUSH is simply broken on OS X, and there was some evidence on
> the list that some people managed to get memcached running "out of the
> box" without this sort of latency. I'd be interested to know if that's
> still the case as it might shed a little more light on the problem.
>
> However, I'm quite willing to conclude there is some underlying
> problem with the operating system, as things continue to get even
> stranger:
>
> As I couldn't use TCP_NOPUSH, I put a "#undef TCP_NOPOSH" at the top
> of the file, which has the effect of making the code set TCP_NODELAY
> on the socket. This is exactly what I wanted:
>
> #if !defined(TCP_NOPUSH)
> setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, &flags, sizeof(flags));
> #endif
>
> This worked quite nicely (about a factor of 3 speedup over the lo
> interface), but when I load tested it for an extended period (about 5
> minutes) it seemed to fairly reliably cause a kernel panic (stack
> trace attached for interest below). Dropping the TCP_NODELAY option
> again seemed to "fix" things, but I've got no idea whether this isn't
> simply because it conspires to slow things down enough such that
> whatever race condition in the kernel is causing the panic doesn't
> happen any more. Does anyone else see this, or is it just a (rather
> annoying) quirk of my machine?
>
> Richard
>
>
>
> *********
>
> Sat Mar 5 19:33:12 2005
>
>
> Unresolved kernel trap(cpu 0): 0x300 - Data access
> DAR=0x0000000000000014 PC=0x000000000020C8F4
> Latest crash info for cpu 0:
> Exception state (sv=0x31747C80)
> PC=0x0020C8F4; MSR=0x00009030; DAR=0x00000014; DSISR=0x40000000;
> LR=0x0020C800; R1=0x12213C20; XCP=0x0000000C (0x300 - Data access)
> Backtrace:
> 0x40471D84 0x0020C330 0x002463E4 0x00094160 0x01C465A0
> Proceeding back via exception chain:
> Exception state (sv=0x31747C80)
> previously dumped as "Latest" state. skipping...
> Exception state (sv=0x28307000)
> PC=0x9002E1CC; MSR=0x0000F030; DAR=0x1C3EB004; DSISR=0x40000000;
> LR=0x00007B38; R1=0xBFFFF910; XCP=0x00000030 (0xC00 - System call)
>
> Kernel version:
> Darwin Kernel Version 7.8.0:
> Wed Dec 22 14:26:17 PST 2004; root:xnu/xnu-517.11.1.obj~1/RELEASE_PPC
>
>
> panic(cpu 0): 0x300 - Data access
> Latest stack backtrace for cpu 0:
> Backtrace:
> 0x000835F8 0x00083ADC 0x0001EDA4 0x00090BD8 0x00093FCC
> Proceeding back via exception chain:
> Exception state (sv=0x31747C80)
> PC=0x0020C8F4; MSR=0x00009030; DAR=0x00000014; DSISR=0x40000000;
> LR=0x0020C800; R1=0x12213C20; XCP=0x0000000C (0x300 - Data access)
> Backtrace:
> 0x40471D84 0x0020C330 0x002463E4 0x00094160 0x01C465A0
> Exception state (sv=0x28307000)
> PC=0x9002E1CC; MSR=0x0000F030; DAR=0x1C3EB004; DSISR=0x40000000;
> LR=0x00007B38; R1=0xBFFFF910; XCP=0x00000030 (0xC00 - System call)
>
> Kernel version:
> Darwin Kernel Version 7.8.0:
> Wed Dec 22 14:26:17 PST 2004; root:xnu/xnu-517.11.1.obj~1/RELEASE_PPC
>
>
> *********
>
More information about the memcached
mailing list