TCP_NOPUSH and Mac OS X

Gregory Block gblock at ctoforaday.com
Mon Mar 7 01:13:03 PST 2005


I had a bug open on Radar regarding memcached causing core dumps when 
used in poll(), kevent(), or anything other than select.  I've been 
informed by the team that there's a fix in place for Tiger, and that 
they've tested under tiger and poll() without seeing the kernel panics.

So...

Until that fix is in place, it's memcached + select() on mac os x, or 
watch your kernel go tits up.

On 5 Mar 2005, at 20:20, Richard Cameron wrote:

>
> There was some discussion on this list last year about some fairly 
> serious performance problems on Mac OS X. I was seeing these too, and 
> I think I've isolated the problem to the TCP_NOPUSH option, and 
> there's a one line hack which seems to solve it.
>
> On OS X 10.3.8, running memcached locally and connecting to it on 
> localhost, the symptoms were that there was a latency of about 0.2 
> seconds between sending a command down the socket to the server and 
> getting a reply. Doing a tcpdump showed that the delay was *exactly* 
> 200ms on every request, however running a kdump showed that memcached 
> was actually writing its response to the socket pretty much 
> instantaneously.
>
> The relevant hack which seemed to get things working again was to 
> simply comment out the line in memcached.c which set TCP_NOPUSH:
>
> #ifdef TCP_NOPUSH
> //    setsockopt(c->sfd, IPPROTO_TCP, TCP_NOPUSH, &val, sizeof(val));
> #endif
>
> It doesn't seem to be well known (at least, Google doesn't know) that 
> TCP_NOPUSH is simply broken on OS X, and there was some evidence on 
> the list that some people managed to get memcached running "out of the 
> box" without this sort of latency. I'd be interested to know if that's 
> still the case as it might shed a little more light on the problem.
>
> However, I'm quite willing to conclude there is some underlying 
> problem with the operating system, as things continue to get even 
> stranger:
>
> As I couldn't use TCP_NOPUSH, I put a "#undef TCP_NOPOSH" at the top 
> of the file, which has the effect of making the code set TCP_NODELAY 
> on the socket. This is exactly what I wanted:
>
> #if !defined(TCP_NOPUSH)
>     setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, &flags, sizeof(flags));
> #endif
>
> This worked quite nicely (about a factor of 3 speedup over the lo 
> interface), but when I load tested it for an extended period (about 5 
> minutes) it seemed to fairly reliably cause a kernel panic (stack 
> trace attached for interest below). Dropping the TCP_NODELAY option 
> again seemed to "fix" things, but I've got no idea whether this isn't 
> simply because it conspires to slow things down enough such that 
> whatever race condition in the kernel is causing the panic doesn't 
> happen any more. Does anyone else see this, or is it just a (rather 
> annoying) quirk of my machine?
>
> Richard
>
>
>
> *********
>
> Sat Mar  5 19:33:12 2005
>
>
> Unresolved kernel trap(cpu 0): 0x300 - Data access 
> DAR=0x0000000000000014 PC=0x000000000020C8F4
> Latest crash info for cpu 0:
>    Exception state (sv=0x31747C80)
>       PC=0x0020C8F4; MSR=0x00009030; DAR=0x00000014; DSISR=0x40000000; 
> LR=0x0020C800; R1=0x12213C20; XCP=0x0000000C (0x300 - Data access)
>       Backtrace:
>          0x40471D84 0x0020C330 0x002463E4 0x00094160 0x01C465A0
> Proceeding back via exception chain:
>    Exception state (sv=0x31747C80)
>       previously dumped as "Latest" state. skipping...
>    Exception state (sv=0x28307000)
>       PC=0x9002E1CC; MSR=0x0000F030; DAR=0x1C3EB004; DSISR=0x40000000; 
> LR=0x00007B38; R1=0xBFFFF910; XCP=0x00000030 (0xC00 - System call)
>
> Kernel version:
> Darwin Kernel Version 7.8.0:
> Wed Dec 22 14:26:17 PST 2004; root:xnu/xnu-517.11.1.obj~1/RELEASE_PPC
>
>
> panic(cpu 0): 0x300 - Data access
> Latest stack backtrace for cpu 0:
>       Backtrace:
>          0x000835F8 0x00083ADC 0x0001EDA4 0x00090BD8 0x00093FCC
> Proceeding back via exception chain:
>    Exception state (sv=0x31747C80)
>       PC=0x0020C8F4; MSR=0x00009030; DAR=0x00000014; DSISR=0x40000000; 
> LR=0x0020C800; R1=0x12213C20; XCP=0x0000000C (0x300 - Data access)
>       Backtrace:
>          0x40471D84 0x0020C330 0x002463E4 0x00094160 0x01C465A0
>    Exception state (sv=0x28307000)
>       PC=0x9002E1CC; MSR=0x0000F030; DAR=0x1C3EB004; DSISR=0x40000000; 
> LR=0x00007B38; R1=0xBFFFF910; XCP=0x00000030 (0xC00 - System call)
>
> Kernel version:
> Darwin Kernel Version 7.8.0:
> Wed Dec 22 14:26:17 PST 2004; root:xnu/xnu-517.11.1.obj~1/RELEASE_PPC
>
>
> *********
>



More information about the memcached mailing list