Performance problems with -threaded?

Consider the following two implementations of a very simple ping/pong server and client, one written in C and one written in Haskell:

Neither of them has been particularly optimized, that’s not the point of this post. The point of this post will become evident when we measure the latency for the following three¬†programs:

  1. The C program compiled with -O2
  2. The Haskell program compiled with -O2
  3. The Haskell program compiled with -O2 and -threaded

Here is a CDF of the latency for all three programs (ghc 7.6.1, Mac OS X):


And again, but with ghc 7.4.2 and on Linux (thanks Mikolaj):


As you can see, programs (1) and (2) are relatively close together (and can probably be brought closer together still) but the latency of (3) is much worse.

(Graph constructed with gnuplot. If you want to reconstruct this graph, run “make” in the benchmarks/ directory of the network-transport-tcp github repository).