Monday, April 25, 2011

Grand Central Dispatch for Win32: things still to do

So, the libdispatch port I’ve been working on is currently quite rough and ready. The major parts all seem to work, though I need to migrate all the tests, but there’s one significant piece missing: the main queue.

Cocoa is, for the most part, single-threaded; updates to a window must be performed on the the thread that owns that window. The same is true of WPF, Win32, and others. However, Cocoa takes things a little further than Win32. In Win32 all threads are essentially created equal, and any thread is allowed to create windows and pump messages. It’s an M:N system. Windows still have thread affinity—any given window must only be updated by the thread running its message loop—but there can be multiple loops on multiple threads, each with their own set of windows.

Not so in Cocoa. The main thread, that is, the one that literally runs main() is special. All windows must have their message loops run on this thread, and all updates must be funnelled through this thread.

As I wrote in the post outlining why I want Grand Central Dispatch, the ability for secondary (worker) threads to run code from the window’s owning thread is highly desirable. In Cocoa, that means running code on the main thread, and so that’s what libdispatch enables.

Corresponding to the main thread, libdispatch creates a main queue. Since there’s only ever one main thread (used for every window), there’s only ever one main queue. Any callbacks placed on the main queue will eventually be executed by the main thread.

Creating a serial queue and enqueuing messages is easy enough; the tricky part is responding to those messages, and it’s here that libdispatch gets a bit tricky. In truth, I’m a little hazy on some of the details, because not all the plumbing is found in libdispatch; there’s also a Cocoa-side integration that I don’t think is public (if it is, I don’t know where the source is).

libdispatch has two different ways of draining the main queue. A last-ditch automatic mechanism used to ensure the right thing happens even when Cocoa isn’t actually called, and to ensure that things work properly, and a good way that integrates properly with Cocoa. The automatic mechanism leverages pthreads’ TLS destructors. A particular TLS property has a destructor that, when invoked from the main thread, will drain the main queue. Drop off the end of the main thread, either just by returning from main() or by calling dispatch_main() (which in turn calls pthread_exit()), and the destructor will be called, draining the queue.

It’s to replicate this mechanism that I investigated the feasibility of implementing TLS destructors in Win32. The implementation kinda works, but annoyingly the TLS destructor is called so late in the thread’s tear-down that it’s basically not safe to do anything, especially not make arbitrary function calls in user-supplied callbacks. Unless I can find some way of resolving this, I’ll need to find some other approach. I think the DLL notifications happen at a better time, but I really want the convenience of a static library.

This is a little annoying. Though a single main thread/main queue isn’t a natural fit for Windows, I could have created a queue per thread and used the same “drain this thread’s queue when the thread is torn down” approach. One workaround that may be effective is to give up on automatic queue draining when returning from main() and instead require dispatch_main() to be called explicitly. This would probably be good enough.

The second mechanism, which is much better as it doesn’t require ending the main thread, is the one I’m a bit less clear about. The key function here is _dispatch_main_queue_callback_4CF(). This function gets called from Cocoa’s message loop, and it drains any messages placed on the main queue, before returning control back to the message loop.

This approach should be much easier to integrate, since it doesn’t depend on any special behaviour of threads or destructors or anything; it’s just a regular function call. Every time something is put onto the main queue it alerts Cocoa (_dispatch_queue_wakeup_main()), and Cocoa then drains the queue. All easy enough to translate into Win32.

However, it’s not quite as simple as that, because of the threading model Windows uses. There isn’t any long a single “main” queue. Any thread with a message loop will have to have its own queue, and the special alerting behaviour will need to take this into account. It will also have to ensure that it alerts the right thread. This will mean altering the queue objects to include an indication of whether they’re a “special” thread queue—that is, one drained from a user thread rather than a pthread_workqueue thread—and, if so, which thread they actually belong to, so that the right thread can be alerted.

There will also need to be some way of accessing these special queues (so that callbacks can be placed on them), so some kind of HWND-to-queue and possibly thread ID or HANDLE-to-queue lookup functions will be necessary.

As luck would have it, the libdispatch test cases all depend on the main queue anyway, so before I can readily port the tests, I’m going to have to put something together to address this need.