Debugging by profiling

Over the past month on a webserver cluster running Apache and a custom Catalyst mod_perl application, there has been an increasing creep in sys% time. The problem went unbothered for a long time, until it hit the point where sys% matched user% percent for percent. This led to massive problems during the peak hours as the servers could no longer cope with load. In addition to reactive ordering of more servers, an investigation was initiated to determine the cause of this increase in sys%. This post will hopefully deliver some insight into a multi-disciplinarian (from a sysadmin and programmer's perspective) approach to problem-solving.

Two tools were used to determine the execution hotspots, oprofile and Apache::DProf.

A summary was first obtained with opreport with no command-line arguments (Table 1). The sys% reported by top is clearly confirmed. Before proceeding to inspect the kernelspace down to function granularity however, the debuginfo RPMs had to be installed. As RHEL4 does not provide these, the SRPM was downloaded and RPMs built from it, a fairly straightforward chore. Then, opreport -l -g was run and produced the output in Table 2. It can be seen that a significant portion of the time was spent with primitive concurrency control operations. Also listed are the files dcache.c and namei.c, of which an inspection shows calls to the said concurrency control ops.

Samples %
25820379 44.8969 vmlinux
24351098 42.3421 mod_perl.so
4576766 7.9582 libc-2.3.4.so
1173028 2.0397 ext3
540725 0.9402 ld-2.3.4.so

Table 1 - Excerpt from opreport

Samples % Linenr info App name Symbol name
5084047 8.8404 spinlock.c:92 vmlinux _spin_lock
3538924 6.1536 dcache.c:1036 vmlinux __d_lookup
3073282 5.3440 dec_and_lock.c:14 vmlinux atomic_dec_and_lock
2629103 4.5716 (no location information) mod_perl.so S_hv_fetch_common
1997009 3.4725 (no location information) libc-2.3.4.so memmove

1778185 3.0920 namei.c:1013 vmlinux path_lookup
1722640 2.9954 namei.c:695 vmlinux __link_path_walk
1672855 2.9088 dcache.c:149 vmlinux dput
1297612 2.2563 namei.c:915 vmlinux link_path_walk
1267104 2.2033 (no location information) mod_perl.so Perl_sv_setsv_flags
1173028 2.0397 (no location information) ext3 (no symbols)

Table 2 - Excerpt from opreport -l -g

Given this lead, it was postulated that the problem was caused by something performing a lot of filesystem operations, such as stat(). A possibility was that a directory used by the app grew in number of files, slowing down directory enumeration. The filesystem was then checked for unusually large directories, but none were found.

I was contemplating using gdb and strace to trace through mod_perl, but at this point the application programmer found something better - a high level profiler called Apache::DProf. The necessary code was added into the source, Apache left to run for a short while, then dprofpp -F run on the resulting profile dumps (Table 3). The top spot was taken by File::Find::find - which fits the kernel. Further investigation showed that the root problem was caused by the use of a Catalyst date formatting filter for each comment element in the generated page. The date formatter then recursively searches /usr/share/zoneinfo to obtain timezone information. This operation became unnecessarily expensive due to the large number of comments in the page.

%Time ExclSec CumulS #Calls sec/call Csec/s Name
73.3 11.20 11.293 619 0.0181 0.0182 File::Find::find
10.6 1.619 15.352 77 0.0210 0.1994 Catalyst::__ANON__
4.99 0.762 0.762 3302 0.0002 0.0002 Catalyst::Plugin::I18N::localize
3.03 0.463 11.899 619 0.0007 0.0192 Catalyst::Plugin::DateTime::Constructor::from_epoch
1.84 0.281 13.097 30156 0.0000 0.0004 Template::Stash::ForceUTF8::get
1.45 0.222 0.222 3298 0.0001 0.0001 Template::Context::filter

Table 3 - Excerpt from dprofpp -F

Finally, a one-line solution was found which was to pass a timezone as an argument to the date formatter. This removed the need for the directory search, and completely solved the problem (Figure 1). Consequently, with the elimination of sys%, headroom was created for double the current peak demand and extra servers unnecessary for now.


Figure 1 - CPU utilization before and after changes

?

Oh I never understand you.. -hr-

Anonymous 01 Jan 1970

But aren't we soulmates..

But aren't we soulmates..

Anonymous 01 Jan 1970

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options