C++11 thread-safe static object initialization

C++11 has this neat feature: static objects are initialized in a thread-safe manner. This is yet another post on how it is done.

First things first: The Standard (well, at least the freely available draft). Here’s what it says about static object initialization (6.7.4):

If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.

Fine, let’s see how it is implemented. I’m using macOS and its tools.

We’ll use this simple program (it does not actually have to output anything, but whatever):

#include <iostream>

class Foo
{
public:
	Foo(int x) : x_(x) {}
	int x() { return x_; }

private:
	int x_;
};

void func()
{
	static Foo foo(42);
	std::cout << "x = " << foo.x() << std::endl;
}

int main()
{
	func();
	return 0;
}

So far so good. Compile it:

$ g++ main.cpp -o main -O0 -g -std=c++11

And disassemble:

$ objdump -disassemble -macho -no-show-raw-insn main

We need the func() function:

__Z4funcv:
100000e00:      pushq   %rbp
100000e01:      movq    %rsp, %rbp
100000e04:      subq    $48, %rsp
100000e08:      cmpb    $0, 4857(%rip)
100000e0f:      jne     0x100000e4c
100000e15:      leaq    __ZGVZ4funcvE3foo(%rip), %rdi ## guard variable for func()::foo
100000e1c:      callq   0x100001cb6 ## symbol stub for: ___cxa_guard_acquire
100000e21:      cmpl    $0, %eax
100000e24:      je      0x100000e4c
100000e2a:      leaq    __ZZ4funcvE3foo(%rip), %rdi ## func()::foo
100000e31:      movl    $42, %esi
100000e36:      callq   __ZN3FooC1Ei ## Foo::Foo(int)
100000e3b:      jmp     0x100000e40
100000e40:      leaq    __ZGVZ4funcvE3foo(%rip), %rdi ## guard variable for func()::foo
100000e47:      callq   0x100001cbc ## symbol stub for: ___cxa_guard_release

Following the prologue there is code that checks the initialization flag and jumps over the initialization entirely if it is non-zero (cmpb $0, 4857(%rip), jne 0x100000e4c). The ___cxa_guard_acquire()/___cxa_guard_release() pair guards the object initialization. The former call acquires the guard variable, and the latter releases it. They are implemented in libc++:

$ nm -m main
[...]
(undefined) external ___cxa_guard_acquire (from libc++)
(undefined) external ___cxa_guard_release (from libc++)
[...]

Now, let’s compile the application without the -std=c++11 switch and see what happens:

$ g++ main.cpp -o main -O0 -g
$ objdump -disassemble -macho -no-show-raw-insn main

The disassembly:

__Z4funcv:
[...]
100000e15:      leaq    __ZGVZ4funcvE3foo(%rip), %rdi ## guard variable for func()::foo
100000e1c:      callq   0x100001cb6 ## symbol stub for: ___cxa_guard_acquire
[...]
100000e47:      callq   0x100001cbc ## symbol stub for: ___cxa_guard_release

Bummer. Turns out, thread-safe initialization has been enabled by default in both GCC and LLVM for a while. If you have solid reasons to disable it (or fancy a little shooting-yourself-in-the-foot), there’s the -fno-threadsafe-statics compiler switch, which does the trick:

g++ main.cpp -o main -O0 -g -fno-threadsafe-statics
objdump -disassemble -macho -no-show-raw-insn main

__Z4funcv:
100000ea0:      pushq   %rbp
100000ea1:      movq    %rsp, %rbp
100000ea4:      subq    $32, %rsp
100000ea8:      cmpb    $0, 4669(%rip)
100000eaf:      jne     0x100000ecd
100000eb5:      leaq    __ZZ4funcvE3foo(%rip), %rdi ## func()::foo
100000ebc:      movl    $42, %esi
100000ec1:      callq   __ZN3FooC1Ei ## Foo::Foo(int)

Notice that now the safeguards are gone and the Foo::Foo(int) constructor is called right after the prologue.

Well, what are these __cxa_guard_acquire()/__cxa_guard_release() functions, anyway? We can find the answer in the Apple’s open source repository, in the file aptly named cxa_guard.cxx. The answer is fairly boring, really (the code samples below have been stripped of all comments and error handling, see the source for the full picture!):

int __cxxabiv1::__cxa_guard_acquire(uint64_t* guard_object)
{
    if ( initializerHasRun(guard_object) )
        return 0;
    ::pthread_mutex_lock(guard_mutex());
    if ( initializerHasRun(guard_object) ) {
        ::pthread_mutex_unlock(guard_mutex());
        return 0;
    }

    if ( inUse(guard_object) ) {
        abort_message(...);
    }

    setInUse(guard_object);
    return 1;
}

void __cxxabiv1::__cxa_guard_release(uint64_t* guard_object)
{
    setInitializerHasRun(guard_object);
    ::pthread_mutex_unlock(guard_mutex());
}

The setInUse(), initializerHasRun(), setInitializerHasRun() and guard_mutex() functions are defined in the same file. What’s worth noting here is:

all static object initializations are protected by the single, global, recursive mutex returned by the guard_mutex() call (the recursive mutex is apparently required to enable nested objects initialization);
a guard object is a 64-bit integer, whose first byte contains the “initializer has run” flag, and the second one contains the “in use” flag;
upon guard acquire we check the “initializer has run” flag and bail out (return 0) if it is set;
after successfully locking the global mutex, we check it again (the initialization could have been performed by another thread while we were waiting on the mutex);
we also check the “in use” flag to make sure that the guard object is not already being used by the same thread at the same time (remember that the global mutex is recursive, and thus such situation is possible in case of a bug);
upon guard release we set the “initializer has run” flag;
finally, the mutex is unlocked (no surprise), but the guard object is still left with the “in use” flag set forevermore.