memcached hiccups?

Just Marc marc at corky.net
Wed Apr 11 18:26:35 UTC 2007


You're most likely out of kernel memory allocated to TCP work.   Tweak 
/proc/sys/net/ipv4/tcp_*mem*

I'd also suggest you run netstat -s and look at the TCP-specific area, 
especially at the failure counters.

Enjoy

>
> Managed to tweak the kernel logging settings to not suppress and got 
> the messages:
>
> Apr 11 10:02:43 poseidon kernel: TCP: drop open request from 
> x.x.x.x/56575
>
> Certainly seems like a kernel problem, and one we haven't seen before. 
> I'm afraid this is out of my depth - why would the kernel be dropping 
> TCP open requests?
>
> FYI, this is happening on kernels:
>
> 2.6.9-42.0.3.ELsmp
> 2.6.9-5.ELsmp
>
>
> Are there kernel tuneables I should be looking at?
>
> Thanks,
>
> Don
>
>
> Don MacAskill wrote:
>>
>> That's what I've suspected, too, so I've been digging, but haven't 
>> found anything that looks suspicious.  :(
>>
>> Don
>>
>>
>> marc at corky.net wrote:
>>> I'm only commenting re the kernel message.  You should dig up the 
>>> original message in your /var/log log files....   It's more likely a 
>>> kernel config issue rather than a memcached issue.   I'm guessing 
>>> it's some sort of out of socket memory condition.  Check it out and 
>>> let us know.
>>>
>>> Marc
>>>
>>>
>>>>
>>>> I've got something strange going on, and can't seem to figure it 
>>>> out. One of our memcached boxes will periodically choke on memcached.
>>>>
>>>> Periodically usually means ~6-8 days or so, but there's some 
>>>> variation.
>>>>
>>>> Choke usually means it fails sock_to_host from lots of clients, so 
>>>> we take it out of rotation and then check it later and put it back in.
>>>>
>>>> It's memcached-1.2.1 from the rpm in dag's repository, using 
>>>> libevent 1.3b from that same repo.
>>>>
>>>> I'm running 4 instances on this box (4 Opteron cores, haven't had 
>>>> the guts to try the multithreaded one yet), all of which fail 
>>>> simultaneously, and one of the four instances says:  "Failed to 
>>>> write, and not due to blocking: Connection reset by peer".  The 
>>>> others don't say anything (only -v).
>>>>
>>>> Finally, the syslog fills up with lots and lots of these:
>>>>
>>>> "Mar 26 09:50:20 poseidon kernel: printk: 13382 messages suppressed."
>>>>
>>>> But there's no indication of what message it was that was suppressed.
>>>>
>>>> There's nothing else running on the box, there's no swap, it's not 
>>>> running out of RAM, etc.
>>>>
>>>> Has anyone else seen this or anything like it?
>>>>
>>>> Thanks,
>>>>
>>>> Don
>>>>
>>>>
>>>
>>>
>>
>



More information about the memcached mailing list