R@M3$H.NBlog

Singleton in Multi-Threaded Environment

02 October, 2013 - 12 min read

The code posted here should be considered as pseudo-code, no particular language is assumed. And also it is assumed that the reader is already familiar with the Singleton pattern, i.e. I am not going to discuss what is a Singleton, when we should use it etc. in detail. Rather I am going to discuss about the difficulties of implementing Singleton in multi-threaded environment. And there are cases where Singleton is considered evil or Anti-Pattern, but again, that is a separate discussion.

If people are asked to name a design pattern, the most commonly uttered name is the Singleton. It is one of the most widely used patterns (and most abused, too). The idea behind the Singleton is simple. It ensures that the Singleton class has only one instance and provides a global access point for it. We can use a Singleton when the presence of multiple instances can potentially damage the system, and we need global access to the single instance. Let’s first see the classical implementation of Singleton:

1
2
3
4
5
6
7
8
9
10
11
12
13
public class Singleton {
    private static Singleton _instance = null;
    
    private Singleton() {}
    
    public static Singleton getInstance() {
        if (_instance == null) {
            _instance = new Singleton();
        }
        
        return _instance;
    }
}

The implementation is straight forward. The constructor of Singleton is private, so it is not possible to instantiate it from outside. The only way to get an instance is to call staticgetInstance() method. getInstance() first checks whether an instance is already created. If not, then it creates an instance, refers it via private static member _instance and then returns that. And if already created then it returns the previously created _instance. Thus only the first call to getInstance() instantiates a Singleton object and any further call returns the same object. Also note that the object is not instantiated until getInstance() is called, i.e. we only create that when actually required. This is called lazy initialization and becomes helpful if the object is resource hungry.

This looks very simple and straight forward implementation. But we have a slight problem with this. This classical implementation is not thread safe. Lets see what may happen in the presence of two threads.

  1. Thread-1 enters getInstance() for the first time and sees that _instance is null and thus the condition is true.
  2. Before instantiating the object a thread switch occur.
  3. Thread-2 enters getInstance() and it will see _instance null too, as the instantiation by Thread-1 is not completed yet.
  4. Thread-2 instantiate new object and then return.
  5. Thread-1 knows nothing about Thread-2. So when it gets its turn again, it instantiates another object and returns that. At this point we have two instances of Singleton which violates the fundamental purpose of the pattern.

So what can we do solve this problem? The easiest solution comes up if we don’t want the lazy initialization. Something like this:

1
2
3
4
5
6
7
8
9
public class Singleton {
    private static Singleton _instance = new Singleton();
    
    private Singleton() {}
    
    public static Singleton getInstance() {
        return _instance;
    }
}

This is guaranteed to be thread safe. The only cost that we pay is that we loose the lazy initialization. If the singleton object is created at every run or the cost of the object is not so high then probably this is the best approach to implement a Singleton in multi-threaded environment.

But what if we want the lazy initialization? The general approach to write a thread safe code is to acquire a lock before accessing the shared resource. We can acquire a thread lock after entering getInstance(). Something like this:

1
2
3
4
5
6
7
8
9
public static Singleton getInstance() {
    acquire_lock();
    if (_instance == null) {
        _instance = new Singleton();
    }
    release_lock();
       
    return _instance;
}

Again, this is guaranteed to be thread safe. Thread-2 can not proceed until Thread-1 completes the instantiation and releases the lock. Obviously after acquiring the lock Thread-2 will see _instance non-null and won’t create a separate instance. But unfortunately we have slight problem with this method, namely performance. Acquiring a thread lock is a very costly operation. Acquiring a lock at every call of getInstance() may affect the overall performance severely, specially when the calls are frequent. And to make matter worse, we only need the lock for the first time. Once _instance is set up with a valid value, there is no need of the locking. But we are acquiring the lock every time and thus wasting our resource. The very idea of lazy initialization is to use resource efficiently, but this method of locking seems overkilling.

Is there any way to get around this problem? We have already realized that the lock is only needed for the first time. There is no need to acquire lock once _instance is initialized. So why don’t we acquire the lock only if _instance is null, like this:

1
2
3
4
5
6
7
8
9
public static Singleton getInstance() {
    if (_instance == null) {
        acquire_lock();
        _instance = new Singleton();
        release_lock();
    }
            
    return _instance;
}

A clever thought. But this will not work. Why? Thread-1 may see the condition true and enter the condition, but before acquiring the lock thread switch may occur. Then Thread-2 will again find the condition true and thus we are in the same situation as before.

Can we fix this? Indeed we can. And with a slight modification. We only need to check_instance against null again after acquiring the lock. Like this:

1
2
3
4
5
6
7
8
9
10
11
public static Singleton getInstance() {
    if (_instance == null) {
        acquire_lock();
        if (_instance == null) {
            _instance = new Singleton();
        }
        release_lock();
    }
            
    return _instance;
}

Lets see what happens with the previous situation:

  1. Thread-1 enters the condition but before acquiring the lock thread switch occurs.
  2. Thread-2 enters the condition, acquires lock, instantiates, releases lock and returns like before.
  3. Thread-1 acquires the lock. But now it will see that _instance is non-null and thus it will not create a new object. It will simply release the lock and return the object created by Thread-2.

Looks like our problem with multiple threads is finally finished. This approach of checking twice is a design pattern in its own right and is called Double Checked Locking Pattern.

Unfortunately, this is NOT guaranteed to work either. The reason is our modern compilers are too smart in optimizing things. An optimizing compiler (all modern compilers are) can reorder the instructions for various reasons. It can reorder read/write calls to improve cache performance, it may try to run as many instructions as possible in parallel when multiple execution units are present (all our modern CPUs has multiple execution units) and for many such reasons. A concrete example might clarify.

1
2
3
int a = 10;
int b = 20;
int c = a + b;

Here the compiler guarantees that the value of c will be 30, but it does not guarantee anything else about a and b. It can completely eliminate the first two instructions as constant expression, it can execute 1st one first and then 2nd one, or it can execute 2nd one first and then 1st one, or even it can execute them in parallel if multiple execution units are present. The summary is: we can not depend on it.

Lets back to our double checked locking code. When _instance = new Singleton() is executed three things happen mainly. A portion of memory is allocated for Singleton object, that memory is initialized with the object’s data and finally _instance points to that memory location. _instance is valid only when the initialization is complete. Here the catch is that the compiler may change the order of these, i.e. it may first allocate the memory and make_instance point to that memory and then go for the initialization of the object. It is also possible that the initialization is taking place in another execution unit. Why the compiler would do so is a matter of study in compiler optimization theory, but the fact is _instancemay point to an uninitialized memory. Lets see what may happen to our double checked locking code in this case:

  1. Thread-1 enters getInstance(), acquires the lock and starts the instantiation process. It allocates the memory and makes _instance point to it.
  2. Before the initialization of created object is complete, a thread switch is occurred.
  3. Thread-2 enters getInstance() and finds _instance non-null, as it is already pointed to the allocated memory by Thread-1.
  4. As Thread-2 finds _instance non-null, it thinks that the instantiation is complete and returns that. It has no way to know that the initialization is yet to be completed.
  5. As a result the caller of getInstance() in Thread-2 receives something that is not initialized properly. If it tries to use the object before the initialization is actually completed, it will create major problem and may even crash the program.

To make matters worse, if this occurs than it will be very difficult to figure out the exact reason of the unexpected behavior or crash, as this will happen in random. How can we deal with this? Well … this might be heart breaking, but there is NO single way which can solve this problem in ALL compilers/hardwares/platforms. We have few work around depending on the platform.

  1. Marking a memory location volatile denotes to the compiler that this memory can be changed in ways not known to the compiler and thus prevents such optimization. Originally that was introduced to handle memory-mapped I/O and was not related to thread, but J2SE 5.0 and Visual C++ 5.0 ensures that volatile works correctly with multiple threads. So making _instance volatile will solve the problem for them. But that is not guaranteed to work with versions of J2SE lower than 5.0 or lower than Visual C++ 5.0 or in other C++ compilers. And also using volatile may affect the performance too.
  2. A memory barrier instruction forces that all read/write instructions before the barrier must be completed before any read/write operation is executed after the barrier. For systems that support memory barrier we can solve this problem by introducing a memory barrier after the instantiation of Singleton object and by introducing a flag as test condition. Something like this:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    public class Singleton {
        private static Singleton _instance = null;
        private static bool flag = false;
        
        private Singleton() {}
        
        public static Singleton getInstance() {
            if (!flag) {
                acquire_lock();
                if (!flag) {
                    _instance = new Singleton();
                    memory_barrier();
                    flag = true;
                }
                release_lock();
            }
            
            return _instance;
        }
    }

    Note that now a new flag is added as the test condition. _instance may point to uninitialized memory, but the memory barrier instruction before flag = true will ensure that flag will be false until the object is fully initialized. The combination of memory barrier and new flag instead of _instance as test condition solves the problem. But the downside of this solution is that if the compiler used does not support a memory barrier instruction natively (.NET has a memory barrier instruction) then it might be difficult to implement a barrier correctly, and may even require assembly coding.

  3. In addition to double checked locking pattern there is an another approach to implement Singleton correctly known as Initialization on demand holder idiom. This method works on all versions of Java.

None of the solutions works in all platforms. All of them exploit platform/compiler specific features. May be the best solution is to ignore lazy initialization. And if we want to stick with lazy initialization, then performance of the full locking version of getInstance() (where lock is acquired after entering getInstance()) can be improved by minimizing calls togetInstance() like this:

1
2
3
4
5
6
7
8
9
10
// requires three calls and thus acquires lock three times
Singleton.getInstance().method1();
Singleton.getInstance().method2();
Singleton.getInstance().method3();
// requires only one call and thus acquires lock only ones
Singleton instance = Singleton.getInstance();
instance.method1();
instance.method2();
instance.method3();

This is not a real solution to the problem, but it can improve performance significantly in practice.

JAVA:

One of the Better way of writing a Singleton is using the Static Inner Helper Class (Bill Pugh Method).

public class Singleton {
 
    private Singleton(){}
     
    private static class SingletonHelper{
        private static final Singleton m_instance = new Singleton();
    }
     
    public static Singleton getInstance(){
        return SingletonHelper.m_instance;
    }
}

When the singleton class is loaded, SingletonHelper class is not loaded into memory and only when someone calls the getInstance method, this class gets loaded and creates the Singleton class instance.
This is the most widely used approach for Singleton class as it doesn’t require synchronization.

END