X25-E performance with write-cache turned off

Following this article about the X25-E SSD and performance when the SSD's write-cache turned off, I made my own observations.

Setup:

  • Adaptec 5805 + 512MB battery-backed cache
  • 4x X25-E 64GB (SSDSA2SH064G1GC) in RAID10, 256KB stripe)

With an active MySQL server with InnoDB tables, I selectively set to write-back/write-through the cache of the the controller (arcconf setcache logicaldrive wt/wbb) and drives ((arcconf setcache device wt/wb) and observed iostat over a 15-second interval.

Controller Disks rrqm/s wrqm/s r/s w/s rKB/s wkB/s avgrq-sz avgqu-sz await svctm %util
WB WT 0.00 70.67 2.53 1158.73 40.80 5434.13 9.43 0.08 0.07 0.07 7.71
WB WB 0.00 62.93 9.40 1107.67 172.80 4976.27 9.22 0.08 0.07 0.07 7.64
WT WB 0.00 62.53 4.53 1126.33 72.53 6157.07 11.02 0.29 0.26 0.24 27.46
WT WT 0.00 63.53 2.60 1169.00 42.13 5757.07 9.90 0.83 0.71 0.68 80.01

Fig. 1 - iostat -k 15 -x across four different cache settings

Performance clearly suffers with write cache turned off, with a 3x increase in (write) latency.

What's interesting though is that the controller's cache resolves this problem and improves latency by another 3x over the SSD's write-back performance.

So a RAID controller is a good complement to SSDs when data integrity is important, improving performance by 10x in our case. With the associated risk of the controller fubar'ing and wiping the cache of course :)

I gotta..

.. start posting here again, for real this time ;D

NAT timeouts, reliable push IMAP email, and Nokia phones

For the past year, I've been struggling with getting email to work reliably on the S60 3rd edition-based Nokia phones (in particular, the E60). Often, after connecting and IDLEing for awhile, the IMAP connection drops. This can be managed by enabling "auto-retrieve", but that in itself has its own problems, such as auto-retrieve disabling itself upon any failure.

I never thought that it could be fixed, but surprisingly, after I migrated my mails from Courier-IMAP to Dovecot, the phone kept its connection for the whole day. What could be different about it?

It turns out that Dovecot implements its own application-layer keepalives, sending a NOOP every 2 minutes (Fig. 1).

static void keepalive_timeout(void *context)
{
        struct cmd_idle_context *ctx = context;

        if (ctx->client->output_pending) {
                /* it's busy sending output */
                return;
        }

        client_send_line(ctx->client, "* OK Still here");
}

Fig. 1 - src/imap/cmd-idle.c

I had long suspected that the connection dropouts were due to the IMAP TCP session timing out by my 3G provider's (Virgin) NAT router. Although TCP keepalives are enabled by Courier (the IMAP server used), keepalives are either too infrequent or not honored by the router.

This workaround creates extra network traffic in both directions, as well as increase the phone's battery consumption. At a keepalive interval of 2 minutes and assuming both the keepalive and acknowledgement total 100 bytes, a day should result in a 72KB worth of keepalive chatter. As an individual, this has negligible impact me, however it may pose problems on mass deployments. Eronen showed in his controlled trials that battery consumption by keepalives is inversely proportional to keepalive interval[1]. However, even at a keepalive interval of 2.5 minutes, battery consumption was 2.5 times of the idle state. This is due to the exceptionally high power consumed when the phone radio is used, as compared when it idles (Eronen determined 29mW at idle, 9200mJ per keepalive).

The keepalive problem is further compounded if several flows or applications keepalive asynchronously. Given that a 3G channel is open for several seconds, it provides a power-conserving opportunity where several flows share and piggyback the channel while it is open.

As far as I'm concerned, all I can in is to make the best compromise out of the situation by increasing the keepalive interval as long as I can (10 minutes?) on my personal Dovecot installation without exceeding Virgin's NAT timeouts.

1. Eronen, P. 2008. TCP Wake-up: Reducing Keep-Alive Traffic in Mobile IPv4 and IPsec NAT Traversal.

General Elections 2008 interest graphed

There has been quite a buzz at the Lowyat.NET forum over the recent Malaysian General Elections. I graphed out the posts in absolute and relative figures for 48 hours from March 08, 3am (Fig. 1, Fig. 2). Traffic picked up quickly from 7pm when ballots were counted, and peaked at 1am when it was clear that the incumbent party lost its 2/3rd majority. Following that, there continued to be latent interest throughout the day.

We hit an unprecedented number of visitors that evening, and I still find it amazing that a single topic can attract almost half of the entire forum's traffic.


Fig. 1 - Posts in the General Election topics (absolute)

Fig. 2 - Posts in the General Election topics versus other topics

Quick takes - Chinese New Year's effect on Malaysian web traffic

A bag of five popular Malaysian websites were selected, all which have visitors originating largely from Malaysia. Their pageviews as estimated by Alexa were then graphed out (Fig. 1). There is clearly a dip in pageviews in the Chinese New Year week, which begins on 7 Feb this year.


Figure 1 - Pageviews of 5 popular Malaysian websites in Feb '08

In addition, public MRTG graphs from TWGate, a public peering point, were obtained (Fig. 2). A reduction in traffic of roughly 50% can be seen in week 5, i.e. the Chinese New Year week.


Figure 2 - MRTG graphs of TWGate-TMNet for Feb '08

It would be interesting to see whether the other major Malaysian festivity, Hari Raya Aidilfitri, will have the same profound effect on Malaysian web traffic when it happens in October.

Quick takes - A tale of two neighbors and a wifi link

I connected my network with my neighbor living 6 floors above, to share Internet access as well as downloaded files. There was fortunately an equally tall building about 30m across us, which we used to reflect the signal off. An 8dBi yagi was used with a PCI wireless card on my neighbor's end together with Vista's layer-3 bridging, and a Linksys WAG54G on my end. The setup (as can be seen in Fig. 1 and Fig. 2) is quite ghetto, with the antenna "mount" being made out of styrofoam and gaffer tape.

Speed tests were initially 50KB/s, but after a little fiddling a one sweet spot (+- 5 degrees?) was found where 1MB/s was achieved. Pretty shocking given that we were going to write it off already!


Figure 1 - 8dBi yagi at neighbor's

Figure 1 - ?dBi rubber ducky at my place

Debugging by profiling

Over the past month on a webserver cluster running Apache and a custom Catalyst mod_perl application, there has been an increasing creep in sys% time. The problem went unbothered for a long time, until it hit the point where sys% matched user% percent for percent. This led to massive problems during the peak hours as the servers could no longer cope with load. In addition to reactive ordering of more servers, an investigation was initiated to determine the cause of this increase in sys%. This post will hopefully deliver some insight into a multi-disciplinarian (from a sysadmin and programmer's perspective) approach to problem-solving.

Two tools were used to determine the execution hotspots, oprofile and Apache::DProf.

A summary was first obtained with opreport with no command-line arguments (Table 1). The sys% reported by top is clearly confirmed. Before proceeding to inspect the kernelspace down to function granularity however, the debuginfo RPMs had to be installed. As RHEL4 does not provide these, the SRPM was downloaded and RPMs built from it, a fairly straightforward chore. Then, opreport -l -g was run and produced the output in Table 2. It can be seen that a significant portion of the time was spent with primitive concurrency control operations. Also listed are the files dcache.c and namei.c, of which an inspection shows calls to the said concurrency control ops.

Samples %
25820379 44.8969 vmlinux
24351098 42.3421 mod_perl.so
4576766 7.9582 libc-2.3.4.so
1173028 2.0397 ext3
540725 0.9402 ld-2.3.4.so

Table 1 - Excerpt from opreport

Samples % Linenr info App name Symbol name
5084047 8.8404 spinlock.c:92 vmlinux _spin_lock
3538924 6.1536 dcache.c:1036 vmlinux __d_lookup
3073282 5.3440 dec_and_lock.c:14 vmlinux atomic_dec_and_lock
2629103 4.5716 (no location information) mod_perl.so S_hv_fetch_common
1997009 3.4725 (no location information) libc-2.3.4.so memmove

1778185 3.0920 namei.c:1013 vmlinux path_lookup
1722640 2.9954 namei.c:695 vmlinux __link_path_walk
1672855 2.9088 dcache.c:149 vmlinux dput
1297612 2.2563 namei.c:915 vmlinux link_path_walk
1267104 2.2033 (no location information) mod_perl.so Perl_sv_setsv_flags
1173028 2.0397 (no location information) ext3 (no symbols)

Table 2 - Excerpt from opreport -l -g

Given this lead, it was postulated that the problem was caused by something performing a lot of filesystem operations, such as stat(). A possibility was that a directory used by the app grew in number of files, slowing down directory enumeration. The filesystem was then checked for unusually large directories, but none were found.

I was contemplating using gdb and strace to trace through mod_perl, but at this point the application programmer found something better - a high level profiler called Apache::DProf. The necessary code was added into the source, Apache left to run for a short while, then dprofpp -F run on the resulting profile dumps (Table 3). The top spot was taken by File::Find::find - which fits the kernel. Further investigation showed that the root problem was caused by the use of a Catalyst date formatting filter for each comment element in the generated page. The date formatter then recursively searches /usr/share/zoneinfo to obtain timezone information. This operation became unnecessarily expensive due to the large number of comments in the page.

%Time ExclSec CumulS #Calls sec/call Csec/s Name
73.3 11.20 11.293 619 0.0181 0.0182 File::Find::find
10.6 1.619 15.352 77 0.0210 0.1994 Catalyst::__ANON__
4.99 0.762 0.762 3302 0.0002 0.0002 Catalyst::Plugin::I18N::localize
3.03 0.463 11.899 619 0.0007 0.0192 Catalyst::Plugin::DateTime::Constructor::from_epoch
1.84 0.281 13.097 30156 0.0000 0.0004 Template::Stash::ForceUTF8::get
1.45 0.222 0.222 3298 0.0001 0.0001 Template::Context::filter

Table 3 - Excerpt from dprofpp -F

Finally, a one-line solution was found which was to pass a timezone as an argument to the date formatter. This removed the need for the directory search, and completely solved the problem (Figure 1). Consequently, with the elimination of sys%, headroom was created for double the current peak demand and extra servers unnecessary for now.


Figure 1 - CPU utilization before and after changes

Quick takes - syslog-ng

1) Not mentioned in the syslog-ng documentation on performance is the log_msg_size() setting. When follow_freq() is used on a logfile to emulate tail -f behavior, it is necessary for that log_msg_size() is large enough to accomodate all log messages for the reading interval.

2) I've been toying around with the idea of using syslog-ng as a plumbing tool to route logs as I wish. The flexibility it provides is excellent while retaining a simple configuration file, reminding me of the Exim MTA. Currently, I'm using syslog-ng to aggregate 1600 lines/s worth of squid logs. With this workload it consumes ~15% of one Xeon L5335 core.

It is currently quite cumbersome to pass the logs to external programs, requiring a named socket to be created for IPC. Perhaps a process spawner/manager could be hacked in to facilitate this?

Some possibilities to consider with the aggregated log:

a) Visualization, by piping it to a stream visualization tool, such as glTail.rb or Ganglia (with an appropriate wrapper to gmetric).

b) Out-of-band IDS, by piping it to a PHPIDS loop. The client headers can be extracted from the log line and exploded into the superglobals. An external trigger can be called to firewall off malicious clients.

c) Out-of-band DOS mitigator, by piping it to a script that watches for unusually high number of requests. Action can be taken as with (b).

Test story?

"It works!"


Image (C) Cryogenics International

I've finally made an effort at thawing this site. Put in a copy of Drupal and found a theme that has exactly the number of colors I want - none. Hopefully this "explosive" start (ooh.. the sarcasm burns!) will persist long enough for me to write about my tech interests and commentaries. We'll see how far this site goes before I let it rot again..