You can't spell KABOOM without OOM

2012-01-16 · 449 words · 3 minute read

Technology · Conferences

lca2012 · lca · OOM · koji · fedora

A real-world debugging problem.

Koji - the build system used inside Red Hat and by Fedora.
DB (postgresql) for build metadata; Hub for the XMLRPC interface; Web GUI; workers (mock, rpmbuild) to do the builds.

Background

Memory is important!
NFS, disk, mmap, swap are all great, but not good enough.
And then the OOM killer comes to town!
Back in the 2.2 and 2.4 days the OOM killer was appalling.
It’s improved enough over time in the process selection that Anthony’s thinking about running without swap and relying on OOM killer,

Koji server running out of problem, it alerts because it’s ground to a halt on low memory, so let’s reboot!
Everything works again, it’s not the DB box, so there’s no data loss. What’s the problem?
Well, the workers need to be restarted, and nobody likes getting paged at 3 a.m. every other day.
So fix the bug!

Some OOM poblems previously, linked to being able to do crazy queries like “List all history for everything forever”, which gets bundled up as a big glob of query, turned into a big glob of XML for XMLRPC, and badthings happened.
Used setrlimit() to kill processes that go too big as a workaround.
Loss of code trust.
Could be another bug in koji - lots of debug logs, troll through the usage logs.
Throttle incoming requests for overuse, but the “overuse” was long-term and hadn’t been causing problems.
Maybe it’s mod_python memory links in RHEL5’s version of python? upgrading to RHEL6 or wscgi seems like a bad idea when there’s already a problem.
Reduce apache MaxRequestsperChild? Didn’t help.
setrlimit() is killing prevents huge processes, but not many big ones. Reducing the number of clients with MaxClients, but this impacts the amount of concurrent usage.
Move to testing - a crash script that opens a number of sockets and do a little work, and report on the success/failure.
Couldn’t reproduce the problem.
Give it more memory!
Still crashes.

Use a soft toy.
Are you running out of memory?
Are you sure?
How does the kernel track memory usage to make that decision?
How about kernel memory structures - slabinfo
Lo and behold - koji was using 1 GB, but 4 - 5 GB was showing as used.
Oh dear. nfs_inode_cache was caching 2.5 GB of data.
Oh dear oh dear. It’s a regression in the RHEL 5 kernel. 5.7 specifically.
Capturing slabinfo while running, but, oh dear, the shell script doesn’t work when you’re swapping heavily. Bugger.
You want to use python, perl, etc, so you aren’t forking or execing and can’t get swapped out.
Demonstrated that it’s an NFS caching problem.