<span style="border-collapse:collapse"><div>We&#39;re running memcached 1.2.2 with libevent 1.3 on OpenSolaris 11.</div><div><br></div><div>We recently changed some of the ways that our application interacts with memcached, and then very suddenly afterward started experiencing core dumps across our 32 instances of memcached, seemingly arbitrarily.

</div><div><br></div><div>We captured the core files and discovered that they were all generated when a request for &#39;stats sizes&#39; was issued by our monitoring processes.</div><div><br></div><div>One of the engineers here postulates the following:

</div><blockquote style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px">

<br>SEGFAULT on line 341 of items.c:<br><br>&nbsp;&nbsp; &nbsp;/* build the histogram */<br>&nbsp;&nbsp; &nbsp;memset(histogram, 0, (size_t)num_buckets * sizeof(int ));<br>&nbsp;&nbsp; &nbsp;for (i = 0; i &lt; LARGEST_ID; i++) {<br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;item *iter = heads[i];<br>

&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;while (iter) {<br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;int ntotal = ITEM_ntotal(iter);&nbsp;<br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;int bucket = ntotal / 32;<br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;if ((ntotal % 32) != 0) bucket++;<br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;if (bucket &lt; num_buckets) histogram[bucket]++;

<br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;iter = iter-&gt;next;<br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;}<br>&nbsp;&nbsp; &nbsp;}<br><br>That&#39;s:<br><br>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;int ntotal = ITEM_ntotal(iter);<br></blockquote><div><br></div><blockquote style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:40px;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px">

Given the huge amount of transactions we&#39;re doing, we&#39;re probably hitting a race condition around moving items from one bucket to the other. &nbsp;Perhaps a mutex lock is not being set properly<br></blockquote><div><br>

</div><div>For the time being we&#39;ve disabled the &#39;stats sizes&#39; request from our monitoring processes to preclude this situation.</div><div><br></div><div>I could not find this to be a known issue in previous messages on this list, but I am certain that someone will end up in this scenario.

</div><div><br></div><div>I can send the core files or gdb output to anyone interested in addressing this.</div><font color="#888888"><div><br></div><div>Jeremy LaTrasse</div><div>Operations</div><div>Twitter</div></font>

</span>