Sunday, April 24, 2011

Grand Central Dispatch for Win32: why I want it

When Apple introduced Grand Central Dispatch in Mac OS X Snow Leopard the response from the fanboy crowd was predictably lunatic, with many of Apple’s fans proclaiming that Snow Leopard would, somehow, enable software to exploit the full potential of parallel processors, with little or no developer effort necessary; a belief that Apple did little or nothing to disabuse them of.

In reality, of course, it was nothing of the sort. Though it does not live up to these lofty goals—and, unsurprisingly, does not even attempt to live up to them—it is nonetheless a very interesting library indeed.

At first glance, it doesn’t look too special. It looks a bit like thread pooling, and there’s nothing fancy or innovative about thread pooling. Thread pooling is how Wikipedia describes GCD, and even Apple’s description is almost wholly fixated with threads and thread management.

It’s true that GCD does use thread pools in the background, but to concentrate on this is to miss the point. GCD’s value lies not in thread pooling, but in queuing.

Queues are at the heart of GCD. Everything happens with queues. Tasks/callbacks are placed on queues and, when dequeued, get executed. There are both parallel queues—which is where the thread pool nature of GCD becomes apparent—and serial queues—which execute strictly in order. Queues can target other queues allowing complex queue graphs to be constructed, in turn allowing simple construction of software pipelines and other complex code flows.

In tandem with GCD, Apple also introduced a new C extension that they have called “blocks”. Blocks are more or less lambdas (anonymous functions) grafted onto C, and there is also a block storage scope, the purpose of which I cannot remember at this time. Unfortunately, they have syntax that’s incompatible with C++ lambdas. Blocks and GCD are separate—you can have either one without the other—but mesh nicely together.

Don’t block the main thread

Grand Central Dispatch is appealing for many reasons. In common with many graphical/windowing toolkits, Cocoa windows are essentially single-threaded in nature. Traditionally, they were tightly restricted: any command that updates the window in any way needs to be executed on the thread owning that window. Those restrictions have been relaxed a little in 10.6, but to avoid problems it’s still a sound model to follow. However, it causes an issue in an all-too-common scenario:

  1. User clicks a button to trigger an action
  2. Action is executed on the window’s thread
  3. The action takes a long time to execute
  4. Because the window is no longer processing messages, it beachballs (Mac OS X) or ghosts (Windows)

This situation is notoriously common, and it makes applications horrible to use, as it makes their interfaces slow and unresponsive. The reason it’s so common is that generally, there’s just no good alternative. Naively, performing the action in a separate thread seems to be the way to go—you might even use a thread pool to do it—and for some actions that’s even true, but in general there’s a problem with this approach.

Typically, when the action is finished, you want to update the window somehow. And to safely update a window, the updates need to be made on the window’s thread. But if your action was spun off into its own thread, it has no good way of getting back onto the window thread. Awkward.

In Windows Presentation Foundation, which has a similar single-threaded approach, you would use Dispatcher.Invoke to make an update back on the main thread. WinForms, the older .NET toolkit that wraps Win32, had a similar mechanism. But pure Win32 and Cocoa alike both lacked any effective way to do this.

Enter Grand Central Dispatch. Grand Central Dispatch on Cocoa provides a special queue that is plumbed in to the main thread (that is, the thread on which windowing operations should be performed). With this special queue, it’s all of a sudden very easy to perform our slow action on a background thread:

    void main_loop(int event, void* data)
    {
       switch(event)
       {
       case OMG_SOMEONE_CLICKED_THE_BUTTON:
          dispatch_async(dispatch_get_global_queue(0, 0), ^{
             doSomethingLengthyToTheData(data);
             dispatch_async(dispatch_get_main_queue(), ^{
                updateMainWindow();
             });
          });
       }
    }

Notice the use of blocks. Blocks are introduced with a caret, ^, followed by an optional argument list (omitted here, because neither block has parameters anyway), followed by the block body inside braces. ^ is also used as a declarator (equivalent to * for pointers), to allow block types to be declared, using syntax equivalent to that of function pointers. This alone is a killer feature.

Asynchronous programming made easy

The second highly desirable capability is provided by what GCD calls “sources”. These are objects that add tasks to a queue autonomously. One obvious type is timer sources. Create a timer source, specify a block/callback, and set its schedule. Then, whenever the timer fires, it’ll put the callback onto whichever queue the source is targeting, and the callback will get run as soon as it is dequeued.

The most exciting source types are the read and write sources. These sources are bound to file descriptors, and trigger the callbacks whenever the descriptors change to become ready for reading or writing, respectively. This provides a very nice building block for applications built around non-blocking I/O. Instead of dealing with the whole mess of select() (or other, better-performing alternatives), you simply create a source, and tell it which code you want it to execute whenever it’s readable or writeable. This is so much nicer it’s not funny.

GCD can, of course, be used as a parallel programming library. It has a basic data-parallel operation, dispatch_apply, that will queue the specified callback n times. This can be used to process arrays in parallel quite easily. Though this is nice enough for simple tasks—“encode every file in this array as an MP3”, say—but it’s quite rudimentary. I wouldn’t want to use it for array-based number crunching, for example: OpenMP provides a better set of primitives for this kind of task.

But for asynchronous development and software pipelining, Grand Central Dispatch is really hard to beat. The API is quite small and simple, making it easy to learn, its programming model is natural, making it easy to integrate into existing codebases, and the functionality it provides is highly desirable.

The big problem is that it’s only available under Mac OS X, with a port also available on FreeBSD. Most of my time these days is spent using and programming Windows—but I want Grand Central Dispatch, and want it badly.

So that’s why I’ve ported it.

  • zaaq

    Several kinds of GCD mechanisms are available on modern Win32 apps:

    Reactive Extensions: http://rxcpp.codeplex.com/
    If on WinRT: http://msdn.microsoft.com/en-us/library/windows/apps/hh780559.aspx
    Or roll your own with C++11: http://msdn.microsoft.com/en-us/library/vstudio/hh920535.aspx

    • zaaq

      My apologies — didn’t note the age of the article. These technologies weren’t available at the time of writing.

  • SemtexCharlie

    I’d check out ZeroMQ, which also has a lot of potential in terms of in-proc queuing and if you dig into the documentation, the primitives it offers can be assembled into a very flexible framework that enables scaling components beyond the local box with very little additional effort. BIt of a learning curve but lives up to it’s perf promises I’ve found…