20

November
2012

Tracking a fatal kernel bug

Slax uses ZRAM to compress RAM memory on the fly. Lately, I was noticing fatal errors (kernel oopses) if lots of RAM was filled very quickly. So I spent almost a day tracking down the issue, recompiling 6 different kernel versions, applying various patches and such.

Finally I was able to track down which particular patch made the problem. Everything is just fine for kernel 3.6.4 and older, but there is a patch for zram in 3.6.5 which makes it unstable in certain situations. After reverting this particular patch, there are no longer any problems.

If you're interested, the incriminated change in kernel is here. I've already notified all the guys who signed that change, hopefully they can fix it. Maybe their code is even correct and it just exposed some other hidden bug in kernel ... who knows. In the mean time, I'll simply revert this particular change for Slax kernels, to bring better stability with no oopses.

User comments
dimitrij 2012-11-20 21:24

Great job!

Seems the last sane kernel version was 3.4. Along with Zram, there is the EXT4 corruption in >3.5.

Also CPU/GPU re-clocking has regressed, my idle consumption is 25W, up from 6.5W! Burning a hole in my desk. Bisecting seems to be quite a pain here, so a fix is not expected for 3.7.

Hope this will get sorted out (and backported) before slax7 goes gold.