Thursday, February 23, 2012

U limit to extend

Sitting in the year 2012 it is needless to say Multithreaded software applications are not an option, it is the norm. Systems are today even more complex with its underneath distributed architecture with Clustered services and multiple system and user threads. Recently three of Tachyon's development team members had to scratch their heads for many hours to debug a problem none of us ever cared to look at before. And post the Aha moment came a tandem smile making it worthwhile to blog about and share with the rest of the Gang.
We were testing Tachyon on a system that we got as part of a system rotational policy. Tachyon is a pure Java application using a few of Oracle's middletier products and we have created a system agnostic deployable artifact and a rich console to manage our processes and nodes. With 1Terabyte of Memory on these machine we were already smiling like a Kid with an Icecream in hand in Winter. We started seeing an OutOfMemoryError when the fourth node was started. This just couldn't be possible we thought - each node has just 1GB heap allocated, in no way possible it could exhaust the 1T memory. "free -m" reflected our assumption. No swapping and plenty of memory available to grow. And then we happen to run the 'ulimit' (..the command I derived my blog title from):
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 28138
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Turns out among three nodes the total number of threads started was already in 900+ and the fourth node didn't have enough user processes available to start. What took us in a total wrong path was an "OutOfMemory" Error. May be its an overused term in this scenario but eventually we did find that under Linux, threads are counted as processes so any limits to the number of processes also applies to threads. In a heavily threaded app we can quickly run out of threads.
There is a security implication too. As we researched the topic more we came across a term "Bash fork() bomb - :(){ :|:& :};:". This is a bash function that gets called recursively and is often used by Unix administrators to test the process limitations. An "unlimited" max user processes setting could also be misused to carry a Denial of Service attack by exhausting the total number of threads to deny applications running on that system to start any new threads.
Following command can be used to find out how many threads are already started by a user:
$ ps -u -L | wc -l
If processes are started by an user account , this command is an useful tool along with "ulimit -a" to figure out how many more processes can still be started - A key mechanism for system provisioning.
At the end for time being, ulimit -u unlimited was good enough for us to continue our testing.