Monday, September 3, 2012

David Chartier HD mini lite (Pro): Since I may have created some confusion over how I feel about Tumblr’s…

David Chartier HD mini lite (Pro): Since I may have created some confusion over how I feel about Tumblr’s…

Tuesday, August 28, 2012

Fisking hell.

I love a good fisking. Sadly, this post by Nadim Kobeissi was not a good fisking of my article.

Now, much to my distaste, I shall fisk through Mr. Bright’s article, a true labyrinth of misinterpretations, inaccuracies and horrible, headache-inducing journalism. First, regarding above claim that the data is anonymized, the whole point of my research was to show you that it isn’t. It is not. IP addresses are communicated to Microsoft in the clear.

That IP addresses are known to both endpoints of an IP connection is not noteworthy. SmartScreen is one of many, many Windows services that require IP connectivity to Redmond’s servers. Windows Update, Windows Activation, the Microsoft Store, crash reporting, CEIP and many more all require occasional connections to servers controlled by the software giant.

Kobeissi’s articles both equivocate somewhat between “users” and “IP addresses”, which is not entirely accurate, but perhaps close enough. The value of an IP address varies from person to person. In the olden days of widespread dial-up, end-user IP addresses could be expected to vary daily, if not more often, due to the vagaries of dial-up access. Thanks to the broadband revolution, this is less often the case; IP addresses can identify single households or offices for long periods of time. For those with static addresses, they can do so more or less indefinitely. This is not quite a per-user identification,

If this information leakage offends you, SmartScreen is neither the only offender, nor is it the first—and if it offends you, then you are probably not using Windows in any case, due to the many ways in which Microsoft can see your IP address.

That this communication is “in the clear” is similarly unexceptional. Every IP address is communicated “in the clear”; to do otherwise requires some kind of IP-in-IP encapsulation (e.g. IPsec ESP tunnel mode), though even here, the “outer” IP address is still in the clear, for obvious reasons.

Kobeissi says that the server Microsoft sends the information to supports the SSLv2 protocol, which is known to be insecure.

Yes, However, far before Mr. Bright wrote his article, I updated my research with the following new finding:

Update 3: Approximately 14 hours after this article was published, another scan of Microsoft’s SmartScreen servers reveals that they have been reconfigured to no longer support SSLv2. The servers now only support SSLv3 connections.

This is unfortunate. I wrote my article before Kobeissi updated his post. It wasn’t published, however, until the following day. These things happen.

This update went completely ignored by Mr. Bright, who brings up SSLv2 again and again in his article, much to my sadness. He later goes on to state that my research was about security risks (No, it’s about privacy risks with only a few notes on security) and that focusing on SSLv2 is unwarranted.

  1. I mention SSLv2 on four occasions. First in a brief description of Kobeissi’s findings; second in a description of the alleged privacy problem; third in explaining why server-side support is not a concern; fourth in Microsoft’s statement. Is this truly “again and again”?

  2. I use the term “security risk” on one occasion. I stand by that usage, as Kobeissi positioned his article as a “security” piece at least as much as he positioned it as a privacy one. The SSL issues, in particular, are security problems as much as they are privacy problems. The original article complains that Microsoft transmits the data “Not Very Securely”, and Kobeissi says that he was “tinkering around from a security/privacy perspective”. He repeats security concerns twice; “The Microsoft server is configured to support SSLv2 which is known to be insecure and susceptible to interception.” and again “Windows 8 appears to send this information to Microsoft to a server that relies on Certificate Authorities for authentication and supports an outdated and insecure method of encrypted communication.”

  3. I do not “focus” on SSLv2. The largest part of the article is about SmartScreen; its history, its purpose, a few pieces about its implementation, and its configuration. I do, however, address the matter of SSLv2, because Kobeissi’s initial research raised it as a concern.

My research was about how Microsoft is making itself an omniscient and single point of data collection regarding what every Windows 8 computer is downloading and installing, and that this is very dangerous from a privacy perspective. In actuality, I’ve found Microsoft’s SmartScreen servers to be vulnerable to the BEAST attack (In retrospect, I don’t think this is the case.)

This is a remarkable statement. Although Kobeissi added the parenthetical after a comment was posted, he initially claimed that SmartScreen was vulnerable to the BEAST attack. The BEAST attack against SSLv3 and TLS 1.0 requires the use of a CBC algorithm on the server side, and Microsoft’s servers do support this cipher suite (indeed, they use AES-CBC algorithm in preference to the BEAST-proof RC4 algorithm, even when the client supports RC4), but it also requires the ability for a hostile party to inject adaptive chosen plaintext to determine the encryption keys being used. This is possible with, for example, Java applets (and WebSockets code written against older versions of the WebSockets specification), but it isn’t possible with the essentially hard-coded SmartScreen check.

On one level, that’s OK. Kobeissi made a mistake (or perhaps had not studied BEAST or previous literature in any great detail) and when the error was pointed out he updated his post accordingly.

On a deeper, however, level it’s problematic. At the very least, this indicates poor attention to detail on Kobeissi’s part—the fact that he has heard of BEAST and knows that SSLv3 and TLS 1.0 are susceptible is enough for him to mention it, regardless of its relevance. But it does more than that. He does not simply say that there may be a possibility of a BEAST attack. No, he states unambiguously “In actuality, I’ve found Microsoft’s SmartScreen servers to be vulnerable to the BEAST attack”. In actuality, he has found no such thing.

But lo and behold, Mr. Bright brings up SSLv2 yet again:

There are some technical problems with Kobeissi’s complaint. Although he says that the server supports SSLv2, that is only part of the story.

Here my state of frustration at Peter Bright’s lacking in the journalistic faculties morphs into a state of boyish wonder; Could it be? Has Peter Bright exceeded the natural limits of misinformed, under-researched journalism? Has he set a new standard? For the entire Internet, perhaps? Am I witnessing history?

Or perhaps, he is witnessing an article written before he updated his post.

This still means that Microsoft could determine which programs individual IP addresses are using.

But lo, a ray of light! A sign of redemption; Peter Bright might actually focus on what I’m trying to say after all!

Here Kobeissi attempts to change history. His original post goes far beyond saying “Microsoft could log IP addresses”. He claims, for example that “The user is not informed [of SmartScreen’s behaviour] while installing and setting up Windows 8”. This is false. He claims—still, even after his later SSLv2 update—”Windows 8 appears to send this information to Microsoft to a server that […] supports an outdated and insecure method of encrypted communication.” He suggests that “SmartScreen is not easy to disable”, which is farcial. He might wish his original claim had been as narrow as “Microsoft can correlate executable hashes to IP addresses” but it was not.

When asked for comment, a Microsoft spokeswoman told us:

“We can confirm that we are not building a historical database of program and user IP data. Like all online services, IP addresses are necessary to connect to our service, but we periodically delete them from our logs. As our privacy statements indicate, we take steps to protect our users’ privacy on the backend. We don’t use this data to identify, contact or target advertising to our users and we don’t share it with third parties.”

The company has also talked in the past about the privacy implications of earlier iteration of SmartScreen. Although Microsoft does collect some data (for example, it distinguishes between popular downloads and unpopular downloads, as part of its application reputation feature), that same data is also anonymized.

As such, the privacy risk here is minimal.

Peter Bright, a tech journalist, just came to the conclusion that a privacy risk was minimal because the corporation being accused of the privacy risk asked him to please trust them; they swear it’s minimal.

My conclusion is that it’s minimal because Microsoft does not, contrary to common belief, set out to deliberately open itself up to legal liability. The company has clear privacy policies in place, and operates in a regulatory climate that is hostile to privacy breaches, whether perceived or real (especially in the EU). Creating such a database would plainly violate the terms of Microsoft’s own privacy policy and as such open up the company to considerable legal liability. The downsides are obvious.

The upsides, however, are not. No advantage to Microsoft of building a persistent database cross-referencing IP addresses to applications is immediately apparent; nor is any such advantage described in Kobeissi’s post.

My reaction to just how ready Mr. Bright is to dismiss my body of evidence that Microsoft could, at any time, record what every default Windows 8 configuration is installing because a Microsoft spokesperson told him to was first this, this, then this, then realizing how painfully backwards parts of the Internet can be, coming to peace with it, and writing this article where I nicely explain to Peter Bright why he sucks at being a tech journalist.

My heart, it is breaking.

It might be unfashionable, but although I am happy to regard corporations as essentially amoral entities, it’s simply not enough to say “Well, they could do something bad” and regard that as reason enough to condemn their actions. There needs to be some evidence of some degree of wrong-doing first. Such evidence is entirely lacking from his analysis.


Saturday, July 7, 2012
Tuesday, January 31, 2012

icacls gripes and moaning

icacls is quite useful for setting permissions. Its save and restore feature is almost very useful. I say “almost” because it is annoyingly narrow; it is not possible to directly transfer permissions from one file to another differently-named file. The saved permissions specify the exact file name in addition to the permission string.

This makes sense for making bulk permission copies from one source to another. However, it’s annoying if your goal is to use one file as a “template” for the permissions of another file.

ICACLS name /save aclfile [/T] [/C] [/L] [/Q]
    stores the DACLs for the files and folders that match the name
    into aclfile for later use with /restore. Note that SACLs,
    owner, or integrity labels are not saved.

The documentation is also erroneous. It claims that saved permissions do not include integrity level information. This is false.

C:TempIntegrityTest>copy con low
        1 file(s) copied.

 Volume in drive C has no label.
 Volume Serial Number is 3A16-AD98

 Directory of C:TempIntegrityTest

31/01/2012  04:00    <DIR>          .
31/01/2012  04:00    <DIR>          ..
31/01/2012  04:00                 0 low
               1 File(s)              0 bytes
               2 Dir(s)  534,558,851,072 bytes free

C:TempIntegrityTest>icacls low
low BUILTINAdministrators:(I)(F)
    NT AUTHORITYAuthenticated Users:(I)(M)

Successfully processed 1 files; Failed processing 0 files

C:TempIntegrityTest>icacls low /setintegritylevel l
processed file: low
Successfully processed 1 files; Failed processing 0 files

C:TempIntegrityTest>icacls low
low BUILTINAdministrators:(I)(F)
    NT AUTHORITYAuthenticated Users:(I)(M)
    Mandatory LabelLow Mandatory Level:(NW)

Successfully processed 1 files; Failed processing 0 files

C:TempIntegrityTest>copy con destination
        1 file(s) copied.

C:TempIntegrityTest>icacls destination
destination BUILTINAdministrators:(I)(F)
            NT AUTHORITYAuthenticated Users:(I)(M)

Successfully processed 1 files; Failed processing 0 files

C:TempIntegrityTest>icacls low /save acl.txt
processed file: low
Successfully processed 1 files; Failed processing 0 files

C:TempIntegrityTest>type acl.txt
  o w
 D : A I ( A ; I D ; F A ; ; ; B A ) ( A ; I D ; F A ; ; ; S Y ) ( A ; I D ; 0 x 1 2 0 0 a 9 ; ; ; B U ) ( A ; I D ; 0 x 1 3 0 1 b f
 ; ; ; A U ) S : ( M L ; ; N W ; ; ; L W )

C:TempIntegrityTest>notepad acl.txt

C:TempIntegrityTest>type acl.txt

C:TempIntegrityTest>icacls . /restore acl.txt
processed file: .destination
Successfully processed 1 files; Failed processing 0 files

C:TempIntegrityTest>icacls destination
destination BUILTINAdministrators:(I)(F)
            NT AUTHORITYAuthenticated Users:(I)(M)
            Mandatory LabelLow Mandatory Level:(NW)

Successfully processed 1 files; Failed processing 0 files

Sunday, May 1, 2011
Monday, April 25, 2011

Grand Central Dispatch for Win32: things still to do

So, the libdispatch port I’ve been working on is currently quite rough and ready. The major parts all seem to work, though I need to migrate all the tests, but there’s one significant piece missing: the main queue.

Cocoa is, for the most part, single-threaded; updates to a window must be performed on the the thread that owns that window. The same is true of WPF, Win32, and others. However, Cocoa takes things a little further than Win32. In Win32 all threads are essentially created equal, and any thread is allowed to create windows and pump messages. It’s an M:N system. Windows still have thread affinity—any given window must only be updated by the thread running its message loop—but there can be multiple loops on multiple threads, each with their own set of windows.

Not so in Cocoa. The main thread, that is, the one that literally runs main() is special. All windows must have their message loops run on this thread, and all updates must be funnelled through this thread.

As I wrote in the post outlining why I want Grand Central Dispatch, the ability for secondary (worker) threads to run code from the window’s owning thread is highly desirable. In Cocoa, that means running code on the main thread, and so that’s what libdispatch enables.

Corresponding to the main thread, libdispatch creates a main queue. Since there’s only ever one main thread (used for every window), there’s only ever one main queue. Any callbacks placed on the main queue will eventually be executed by the main thread.

Creating a serial queue and enqueuing messages is easy enough; the tricky part is responding to those messages, and it’s here that libdispatch gets a bit tricky. In truth, I’m a little hazy on some of the details, because not all the plumbing is found in libdispatch; there’s also a Cocoa-side integration that I don’t think is public (if it is, I don’t know where the source is).

libdispatch has two different ways of draining the main queue. A last-ditch automatic mechanism used to ensure the right thing happens even when Cocoa isn’t actually called, and to ensure that things work properly, and a good way that integrates properly with Cocoa. The automatic mechanism leverages pthreads’ TLS destructors. A particular TLS property has a destructor that, when invoked from the main thread, will drain the main queue. Drop off the end of the main thread, either just by returning from main() or by calling dispatch_main() (which in turn calls pthread_exit()), and the destructor will be called, draining the queue.

It’s to replicate this mechanism that I investigated the feasibility of implementing TLS destructors in Win32. The implementation kinda works, but annoyingly the TLS destructor is called so late in the thread’s tear-down that it’s basically not safe to do anything, especially not make arbitrary function calls in user-supplied callbacks. Unless I can find some way of resolving this, I’ll need to find some other approach. I think the DLL notifications happen at a better time, but I really want the convenience of a static library.

This is a little annoying. Though a single main thread/main queue isn’t a natural fit for Windows, I could have created a queue per thread and used the same “drain this thread’s queue when the thread is torn down” approach. One workaround that may be effective is to give up on automatic queue draining when returning from main() and instead require dispatch_main() to be called explicitly. This would probably be good enough.

The second mechanism, which is much better as it doesn’t require ending the main thread, is the one I’m a bit less clear about. The key function here is _dispatch_main_queue_callback_4CF(). This function gets called from Cocoa’s message loop, and it drains any messages placed on the main queue, before returning control back to the message loop.

This approach should be much easier to integrate, since it doesn’t depend on any special behaviour of threads or destructors or anything; it’s just a regular function call. Every time something is put onto the main queue it alerts Cocoa (_dispatch_queue_wakeup_main()), and Cocoa then drains the queue. All easy enough to translate into Win32.

However, it’s not quite as simple as that, because of the threading model Windows uses. There isn’t any long a single “main” queue. Any thread with a message loop will have to have its own queue, and the special alerting behaviour will need to take this into account. It will also have to ensure that it alerts the right thread. This will mean altering the queue objects to include an indication of whether they’re a “special” thread queue—that is, one drained from a user thread rather than a pthread_workqueue thread—and, if so, which thread they actually belong to, so that the right thread can be alerted.

There will also need to be some way of accessing these special queues (so that callbacks can be placed on them), so some kind of HWND-to-queue and possibly thread ID or HANDLE-to-queue lookup functions will be necessary.

As luck would have it, the libdispatch test cases all depend on the main queue anyway, so before I can readily port the tests, I’m going to have to put something together to address this need.

Grand Central Dispatch for Win32: the source code

Having explained why I want to port Grand Central Dispatch to Windows and outlined some of the issues in doing so, it’s probably a good time to show some source code!

I’ve put the code into github. I’m not sure I’m entirely enamoured of github, or git in general, and I’m not even sure that I’ve pulled Apple’s source in the best possible way (I’m using the subtree merge approach instead of submodules, but am vague on the pros and cons of each mechanism).

github will tell you about all the modifications if you’re interested, but it’s probably worth mentioning a few things explicitly. I’ve attempted to change as little as possible, with the proviso that the thing has gotta build and at least give the appearance of working. The only file with wholesale changes is queue_kevent.c, which effectively has two wholly independent implementations, one using kevent(), the other using I/O completion ports.

The most disruptive modification to the source tree was the creation of the /platform hierarchy. This is where I put the Win32 stubs for UNIX headers, including the pthread_workqueue implementation. The implementation is fairly straightforward. I’ve implemented more than is strictly necessary for libdispatch—but not everything. Some concepts, such as workqueue suspension and resumption, have no obvious parallels in Win32.

I should note that I used new-style Win32 threadpools, available in Windows Vista and up. This means the code won’t work on Windows XP. The reasons for picking the new API are multiple:

  1. It can be used robustly, whereas the old one cannot; the old one provides no way of properly handling out-of-resource situations.
  2. It allows multiple pools per process, which allows libdispatch’s pools to be relatively isolated from any others that the application might create. This seems to reduce the possibility of surprises.
  3. The old threadpool API lacks any effective way of tidying up, in particular preventing callbacks from safely performing such tasks as unloading the DLL they are running from, and having no way to ensure that every callback is safely executed or deallocated.
  4. There did not seem any obvious way to implement e.g. pthread_workqueue_removeitem_np using old-style threadpools.
  5. Timer queue timers have no leeway facility.

Honestly, in this day and age, nobody should be using Windows XP. Windows Vista and Windows 7, which support the new API, are both substantial improvements on that operating system.

One of the most pervasive changes (annoyingly so, it’ll clutter up any diffs) was the insertion of the function as_do() (as in, interpret this object “as a DO” (dispatch object)). This is because Visual C++ doesn’t support gcc’s transparent_union attribute. transparent_union seems to allow a pointer to any of a union’s member types to be implicitly converted to a pointer to the union type itself, when the union type is used as a function parameter. In C++, of course, the solution would be to make the members publicly inherit a base class and use that to allow implicit upcasting.

Also in the source tree is the imaginatively-named libdispatch++. This is a very thin C++ wrapper around the C API. Normally I wouldn’t bother, except for one thing: C++0x includes lambdas. Here’s the thing: I can’t add block support to Visual C++; at best it would require a custom source-source translator to preprocess the code, at worst it’d require intrusive compiler changes. The former is more work than it’s worth; the latter I simply can’t do. With C++0x’s lambdas, the case for blocks is considerably less compelling anyway. The lambdas support essentially the same range of things that blocks do, and they’re standard to boot. They should allow usage something along the lines of:

    void main_loop(int event, void* data)
          gcd::dispatch_queue::get_global_queue(0, 0).async([]()->void {
             gcd::dispatch_queue::get_main_queue().async([]()->void {

(though get_main_queue() is not actually implemented yet for technical reasons). I think this compares reasonably well with the block versions. I might rename the classes to get rid of the wordy “dispatch_”.

Still to be ported are the test cases. This is obviously important, but they’re currently all written using blocks, and all are designed to be standalone executables (i.e. each test has its own main()). I don’t want to create one Visual C++ project per test case, but can’t immediately think of any good way to aggregate them without breaking anything.

Grand Central Dispatch for Win32: the port

Having established that Grand Central Dispatch would be a good thing to have on Windows, the task was to begin porting it.

The good news is that the actual implementation of Grand Central Dispatch, named libdispatch, is open source, released under the Apache license.

The bad news is that it basically won’t work with anything that isn’t Mac OS X. libdispatch depends on two particular technologies that aren’t widely available: pthread_workqueues and kevent().

pthread_workqueues are an Apple invention; they’re kernel-supported thread pools. Although some of the Mac OS X source code is open source—including that of pthread_workqueues—there’s no real documentation available.

kevent() is found on FreeBSD; in fact, that’s where Mac OS X gets it from. It’s designed as a high-performance alternative to select(), designed in particular to address two major flaws with that API. One, select() has no memory; on every single call, the entire set of descriptors of interest must be passed in to the function, even if they’re the same every time. Busy servers can waste a considerable amount of time just copy the array of descriptors into and back from kernel mode.

Two, the function requires O(n) scans. When select() returns to the caller, it indicates only whether any descriptors became ready or not. The caller than has to scan the entire descriptor array looking for any that are ready to operate on. On a busy server with thousands of concurrent connections, this scanning takes a prohibitive amount of time.

kevent() fixes this. Instead of passing the descriptors each time, a persistent kernel queue object is created by calling kqueue(). File descriptors of interest are then registered in this queue with the kevent() call. The application can then wait for the queue to signal activity, again by calling kevent(). When the function returns, it provides an array of results with activity, ending the need to perform performance-sapping O(n) scans.

A third, less significant, issue is the use of Apple’s lambda-like blocks. Fortunately, libdispatch does not use these internally, and every block-using function has an equivalent that uses a regular function pointer/void* context pointer pair. Behind the scenes, these are called by the block versions of the functions.

Other ports

The first group to work on porting libdispatch to a non-Apple platform was FreeBSD. FreeBSD was in the strongest position, of course; Mac OS X is already based on FreeBSD, which is why libdispatch uses kevent() in the first place, and similarly the modifications made by pthread_workqueues were made to a FreeBSD-derived system. The port was completed relatively quickly, and is now claimed to be stable.

Less easy are ports to other platforms. There are efforts to bring libdispatch to Linux and Solaris. User-mode implementations of pthread_workqueues are feasible (if not optimal), and effort has been put into creating mimics for kevent() that leverage the alternative facilities within those operating systems.

Still, even on those platforms, the work was reasonably simple. They already use pthreads, and their I/O models are similar in capability, just gratuitously different.


For good or ill, Windows is like nothing else on earth. It doesn’t use pthreads, and it has a very different I/O model. Neither of these are bad things as such—in fact, the reasons behind both decisions are very sound—but it means that porting Unix-oriented software like libdispatch is more work than might otherwise be the case.

While it wasn’t until Snow Leopard’s release that Mac OS X had a thread pool API, pthread_workqueues, Windows has had one since Windows 2000. In Windows Vista, a new, rather more robust thread pool API was added. For the most part, this thread pool API maps 1:1 with Apple’s pthread_workqueues. Wrapping the former to provide an API equivalent to the latter is not a major undertaking, and it works pretty well.

kevent() and I/O in general are a more difficult issue. The preferred model in Unix is that of readiness notifications. select() and kevent() both return when a file descriptor is readable without blocking; that is, when they have data available. You call the API, wait for readiness, and then do the actual read or write operation. The actual read or write in these situations is a regular synchronous blocking operation—it’s just that you know it won’t block because of the readiness notification.

Windows works on a model of completion notifications. You tell Windows to read or write to a file or socket, and it wakes up your program when that action has actually taken place. In this model, the read or write operations are non-blocking and asynchronous—the API calls return immediately. Microsoft calls it overlapped I/O.

There are a couple of ways to use overlapped I/O. The most basic is to wait for an event to be triggered when the operation has completed; do the read or write operation in one thread, wait for the event in a second thread, and use that second thread to actually respond to the operation. This works, but as might be expected, is lousy for scalability.

The better way is to use I/O completion ports. With these, one or more HANDLEs are bound to an object called a completion port. Whenever an overlapped I/O operation on any of those HANDLEs is completed, the completion port is signalled and passed the results of the operation. This means that instead of having to wait on a whole bunch of events, it’s possible to create just a few threads to respond to completion notifications, and they can process completion events for hundreds or thousands of HANDLEs.

In practice, this is a great model, although quite confusingly documented. In fact, it’s the model you probably want. Readiness notifications combined with a mix of blocking and non-blocking calls are really kind of hokey. Especially when coupled with a particularly annoying Unix trait that disk files are always deemed readable and writeable. Even though operations on disk files will actually block, select() will claim they’re ready at all times.

The downside to this is that the dispatch sources in libdispatch are designed for readiness notifications. Without using select(), with all its problems, there’s no good way to do that on Windows. This leaves two options. Cobble something together that (a) won’t be as good (b) doesn’t fit in with the natural Win32 I/O paradigm, or say screw it: instead of aiming for exact 1:1 compatibility with the Mac OS X dispatch sources, create a new overlapped I/O source that enqueues callbacks whenever an overlapped operation has completed.

It is this second route that I have taken.

Minor issues

The libdispatch source code is more or less C99 with various gcc extensions. I want to be able to use Visual C++, which only supports C89 and essentially no gcc extensions (though there are one or two non-standard features in common to both compilers). After all, there’s little point in producing a Win32-native version of the library if it’s going to force people to use the MinGW or Cygwin toolchains.

To that end, C99-style named initializers need to be replaced with C89-style aggregate initializers, some weird gcc implicit casts need to be replaced with explicit casts, and a few other bits and pieces need to be changed around.

libdispatch also, unsurprisingly, depends on the existence of various POSIX APIs. Stub libraries fill out the missing API calls; mainly simple stuff like clock_gettime().

One area that needed a little more attention is pthreads. Or rather, pthreads’ TLS. pthreads’ TLS includes a destructor feature that ensures it tidies up TLS values if a thread exits. Win32’s TLS has no equivalent capability. There is a way to work around this, but it has some issues that I have not yet resolved, and might not be able to resolve.

The good news is that this capability might be unnecessary. The use of pthread TLS destructors is to handle the special “main” thread in Cocoa. As a special compatibility feature, if the main thread exits whilst callbacks are queued on the main queue, the main thread will execute those callbacks. The preferred way to handle this, even on Mac OS X, is to explicitly call dispatch_main() to drain the main queue. Using this approach, the pthread TLS destructors aren’t needed anyway, and this style is probably more amenable to Win32, which has no main thread.

Sunday, April 24, 2011

Grand Central Dispatch for Win32: why I want it

When Apple introduced Grand Central Dispatch in Mac OS X Snow Leopard the response from the fanboy crowd was predictably lunatic, with many of Apple’s fans proclaiming that Snow Leopard would, somehow, enable software to exploit the full potential of parallel processors, with little or no developer effort necessary; a belief that Apple did little or nothing to disabuse them of.

In reality, of course, it was nothing of the sort. Though it does not live up to these lofty goals—and, unsurprisingly, does not even attempt to live up to them—it is nonetheless a very interesting library indeed.

At first glance, it doesn’t look too special. It looks a bit like thread pooling, and there’s nothing fancy or innovative about thread pooling. Thread pooling is how Wikipedia describes GCD, and even Apple’s description is almost wholly fixated with threads and thread management.

It’s true that GCD does use thread pools in the background, but to concentrate on this is to miss the point. GCD’s value lies not in thread pooling, but in queuing.

Queues are at the heart of GCD. Everything happens with queues. Tasks/callbacks are placed on queues and, when dequeued, get executed. There are both parallel queues—which is where the thread pool nature of GCD becomes apparent—and serial queues—which execute strictly in order. Queues can target other queues allowing complex queue graphs to be constructed, in turn allowing simple construction of software pipelines and other complex code flows.

In tandem with GCD, Apple also introduced a new C extension that they have called “blocks”. Blocks are more or less lambdas (anonymous functions) grafted onto C, and there is also a block storage scope, the purpose of which I cannot remember at this time. Unfortunately, they have syntax that’s incompatible with C++ lambdas. Blocks and GCD are separate—you can have either one without the other—but mesh nicely together.

Don’t block the main thread

Grand Central Dispatch is appealing for many reasons. In common with many graphical/windowing toolkits, Cocoa windows are essentially single-threaded in nature. Traditionally, they were tightly restricted: any command that updates the window in any way needs to be executed on the thread owning that window. Those restrictions have been relaxed a little in 10.6, but to avoid problems it’s still a sound model to follow. However, it causes an issue in an all-too-common scenario:

  1. User clicks a button to trigger an action
  2. Action is executed on the window’s thread
  3. The action takes a long time to execute
  4. Because the window is no longer processing messages, it beachballs (Mac OS X) or ghosts (Windows)

This situation is notoriously common, and it makes applications horrible to use, as it makes their interfaces slow and unresponsive. The reason it’s so common is that generally, there’s just no good alternative. Naively, performing the action in a separate thread seems to be the way to go—you might even use a thread pool to do it—and for some actions that’s even true, but in general there’s a problem with this approach.

Typically, when the action is finished, you want to update the window somehow. And to safely update a window, the updates need to be made on the window’s thread. But if your action was spun off into its own thread, it has no good way of getting back onto the window thread. Awkward.

In Windows Presentation Foundation, which has a similar single-threaded approach, you would use Dispatcher.Invoke to make an update back on the main thread. WinForms, the older .NET toolkit that wraps Win32, had a similar mechanism. But pure Win32 and Cocoa alike both lacked any effective way to do this.

Enter Grand Central Dispatch. Grand Central Dispatch on Cocoa provides a special queue that is plumbed in to the main thread (that is, the thread on which windowing operations should be performed). With this special queue, it’s all of a sudden very easy to perform our slow action on a background thread:

    void main_loop(int event, void* data)
          dispatch_async(dispatch_get_global_queue(0, 0), ^{
             dispatch_async(dispatch_get_main_queue(), ^{

Notice the use of blocks. Blocks are introduced with a caret, ^, followed by an optional argument list (omitted here, because neither block has parameters anyway), followed by the block body inside braces. ^ is also used as a declarator (equivalent to * for pointers), to allow block types to be declared, using syntax equivalent to that of function pointers. This alone is a killer feature.

Asynchronous programming made easy

The second highly desirable capability is provided by what GCD calls “sources”. These are objects that add tasks to a queue autonomously. One obvious type is timer sources. Create a timer source, specify a block/callback, and set its schedule. Then, whenever the timer fires, it’ll put the callback onto whichever queue the source is targeting, and the callback will get run as soon as it is dequeued.

The most exciting source types are the read and write sources. These sources are bound to file descriptors, and trigger the callbacks whenever the descriptors change to become ready for reading or writing, respectively. This provides a very nice building block for applications built around non-blocking I/O. Instead of dealing with the whole mess of select() (or other, better-performing alternatives), you simply create a source, and tell it which code you want it to execute whenever it’s readable or writeable. This is so much nicer it’s not funny.

GCD can, of course, be used as a parallel programming library. It has a basic data-parallel operation, dispatch_apply, that will queue the specified callback n times. This can be used to process arrays in parallel quite easily. Though this is nice enough for simple tasks—“encode every file in this array as an MP3”, say—but it’s quite rudimentary. I wouldn’t want to use it for array-based number crunching, for example: OpenMP provides a better set of primitives for this kind of task.

But for asynchronous development and software pipelining, Grand Central Dispatch is really hard to beat. The API is quite small and simple, making it easy to learn, its programming model is natural, making it easy to integrate into existing codebases, and the functionality it provides is highly desirable.

The big problem is that it’s only available under Mac OS X, with a port also available on FreeBSD. Most of my time these days is spent using and programming Windows—but I want Grand Central Dispatch, and want it badly.

So that’s why I’ve ported it.

Sunday, March 13, 2011
Friday, December 17, 2010

“What the tablet does, for the first time, is let us hit the reset button on the presentation of content to readers.”

Mike McCue, CEO of Flipboard, in Web format has ‘contaminated’ online journalism | Technology | Los Angeles Times

Does it, though? It’s not the web that contaminated journalism. It’s measurable performance and the need to pay journalists. It’s perfectly possible to create attractive web pages that are clean and uncluttered. It’s just they don’t pay the bills. Users have shown a continued reluctance to pay for copy, leaving few viable alternatives to advertising.

Yes, the advertising is often ugly and intrusive. Of course—give advertisers the ability to accurately gauge just how effective their adverts are, and this will inevitably be the result. Tasteful, inoffensive, and ignorable doesn’t actually work.

Maybe they’ll change their mind on tablets, but it’s not something I would count on—such a change of heart certainly isn’t apparent from early tablet experiments. The same pressures—a desire to monetize, an inability to charge for content, a strong ability to track advertising efficacy—will influence tablet software given time.

Recently in the message thread for your article, “Lies, damned lies, and benchmarks: is IE9 cheating at SunSpider?”, you made this statement (in response to someone else): “And bugs should be rare, since all they do is get the answer wrong for no purpose. Guess what? They’re not rare.” This struck me as a particularly true statement. Bugs should be rare, but they aren’t. It seems like there might be a full article here about why bugs aren’t more rare. I thought that might be an interesting read. So I guess my question is, have you thought about writing an article on why software bugs aren’t more rare?

This would be an interesting article to write. I’m not entirely sure if it can be written, however. I think if people—developers—had a clear grasp of why they created bugs then they would be in a position to do something about it. And yet, in spite of countless methodologies, millions of books, endless hours of seminars, we’re frankly no better today than we were decades ago.

People may point at NASA, but NASA’s solution, such as it is, is to scale back problems until they’re tiny, spec the hell out of them, and then write code. It’s extremely expensive, extremely slow, totally non-scalable—and there are still bugs at the end of it.

Friday, September 24, 2010


This is comical, but the actual likely intention is less fun than killing hung apps: it’s probably to get through the Windows NT-style “Press Ctrl + Alt + Delete to log on” screen, a relic from 1993, which is necessary on tablets presumably because Microsoft’s internal structure, politics, and fragmentation precluded the Tablet PC team from getting the Windows Account Security Or Whatever team to make an exception to this procedure for this edition of Windows 7.

Um, really? OK, there’s so much wrong with the software on the Slate that it’s hard not to laugh and/or weep at the stupidity of it.

But this complaint is idiotic. ctrl-alt-del to log in is optional—but on by default in domains. And since this is a machine aimed at enterprise markets, guess what? It’s gonna be domain-joined. ctrl-alt-del is a legitimate security feature; instead of bitching about it or claiming it to be a “relic” you should be wondering why there are still OSes on the market that don’t demand an SAK.

Wednesday, August 4, 2010

Vegetarian for a month: the truths and untruths

Vegetarian for a month: the truths and untruths

Tuesday, July 20, 2010
Monday, July 19, 2010