Mutual Exclusion with Locks – an Introduction
Do you know the difference between a mutex and a spinlock? You know exactly what kind of locks are employed in e.g. OpenMP? You find hierarchical locks boring and recursive spinlocks only make you yawn? Then you will most likely NOT enjoy reading this article, and I encourage you to stop reading now, as it is way too basic for you – don’t worry, though, as I have a way more advanced article in the pipeline :-). Seriously, stop!
Now wouldn’t that be an interesting statistic as to how many of my readers just stopped reading at this point, because they already know everything I have to offer in this article :-). Unfortunately, I will have to cope without this knowledge for now. Anyways, since you are still reading I will assume that I can tell you something interesting today (or so I hope) – and if not, you would CFLink, right? (Now, that’s an innovation! A call to action right at the beginning of my post! Will see how this turns out 😉 )
I will start out this article with a quick introduction as to why mutual exclusion is necessary and how it is done with locks and will then turn to some interesting variations of it. I will not cover semaphores, monitors or all the other techniques to guarantee mutual exclusion, but will concentrate on locks and their variations (which includes mutex variables and spinlocks).
Why Mutual Exclusion is Needed for Shared Memory
Let’s assume we do not use Mutual Exclusion for a second and two of our threads want to increment a shared variable, conveniently called i with a nice and simple command such as i++. Now what’s really happening is, that (depending on your architecture), not one command is carried out by your processor, but most likely three:
- load value of i into a register (lets call it r1)
- increment r1 by 1
- store value in r1 back into i
And we did not even need assembler for that (have I mentioned that I do not particularly like assembler?)! Anyways, let’s see what could happen in the worst case with our example and two threads (assuming i had a value of 1 beforehand):
- Thread 1: load value of i into a register on processor 1 (lets call it r1) [i=1, r1=1]
- Thread 2: load value of i into a register on processor 2 (lets call it r2) [i=1, r1=1, r2=1]
- Thread 1: increment r1 by 1 [i=1, r1=2, r2=1]
- Thread 2: increment r2 by 1 [i=1, r1=2, r2=2]
- Thread 1: store value in r1 back into i [i=2, r1=2, r2=2]
- Thread 2: store value in r1 back into i [i=2, r1=2, r2=2]
The values in brackets are there to remind you what exactly is stored inside the variable and registers at this point in time. Remember, steps 1 and 2, 3 and 4, 5 and 6 are happening in parallel, on two different CPUs! i should have been incremented two times, so the final result should be i=3. Yet, it is not, as can be seen at the end of step 6. And that’s why we need mutual exclusion, as two threads working on the same data at the same time is a recipe for disaster! By the way, you do not even need two processors for this to happen: When you have two threads running on a single processor and the scheduler decides to stop thread 1 after it has just loaded the value of i into its register, you have the same dilemma.
How Mutual Exclusion is Done
Now that we have seen the problem, lets take a look at the solution. Simply put, we need to stop the two threads from working on the same data at the same time (that’s why it’s called mutual exclusion after all). The most common way to do this today is by using locks. A lock can be either locked or unlocked (sometimes also referred to as set or unset). And it really behaves like a lock on an ordinary door in the real world: You enter a room, you lock the door because you do not want to be disturbed, do whatever you need to do in there, unlock the door and only then can somebody else enter. If anybody tries to enter the room while you are still in there, he has to wait. As long as you do not forget to lock or unlock the door, this algorithms guarantees mutual exclusion and protects the so called critical region. And this is what the whole thing would look like using a parallel programming system (OpenMP in this case):
- omp_set_lock (&my_lock);
- i++;
- omp_unset_lock (&my_lock);
Actually, the lock needs to be initialized beforehand and destroyed sometime afterwards as well, but that’s not too difficult either. Furthermore, in OpenMP there is a way to do this even easier, like this:
- #pragma omp critical
- {
- i++;
- }
Or even simpler, like this:
- #pragma omp atomic
- i++;
This basically does the same thing, except you do not need to worry about initialization and destruction of the lock and you cannot forget to unlock the mutex accidentally, but this is not what this article is about, so I will leave it at that and return to our regularly scheduled locks. Actually, that is all I wanted to say about how locks work in general. If you stick to the algorithm (lock, change, unlock) at any time the shared resource is worked on, you are fine. If you don’t, you code may still break ;-). In my experience, the most difficult thing to get right is not to understand the reason why mutual exclusion is necessary or even how it is done, but the most difficult thing is to spot all the places in the code where it is needed. But then again, even that is not that difficult and my students usually do get it right the second or third time at the latest. OK, now that you know the basics, on to the more interesting variations I promised earlier.
Mutex
The mutex is the most well-known way to enforce mutual exclusion (it better be, with a name like that 😉 ). It basically behaves like the lock sketched above: When a thread tries to lock a mutex, it is either acquired (if no other thread presently owns the mutex) or the requesting thread is put to sleep until the lock is available again (in case another thread presently owns the lock). When there are multiple threads waiting on a single lock, the order in which they are awoken is usually not determined. So much for the mutex, let’s now take a look at spinlocks.
Spinlocks
The functions associated with spinlocks are syntactically equivalent to the ones mentioned for mutexes, the subtle difference lays in the way the wait for the lock is handled: Contrary to mutexes, threads are not put to sleep on spinlocks, but instead continue to spin (that means trying to acquire the lock in this context). Therefore, they usually have a quicker response time (as no thread needs to be woken up as soon as the lock is unlocked), but also do waste processor cycles (a process commonly known as busy waiting). Spinlocks are very common in High Performance Computing, because in this field most of the time each thread is scheduled on its own processor anyways and therefore there is not much to gain by putting threads to sleep (which is a quite time-consuming process after all).
It depends on your threading system, which lock-variant is provided to you. Pthreads for example used to have only mutexes, but now provides both. OpenMP is silent about the issue in the specification, therefore the compiler vendors are free to use whatever variety they wish (as far as I remember, many use a mixture of both, where a thread is spinning for a predefined amount of time and if the lock could not be acquired then, it is put to sleep). Check the documentation of your threading system for details about your locks, with the information provided above it should be pretty easy to find out.
Recursive Locks
There is another, orthogonal dimension to locking, though. What happens, when a thread that already owns a lock tries to lock it again? The answer is usually a deadlock – the thread will hang. Why would you want to do a thing like that? The common answer to that is: inside recursive functions or functions calling each other from inside the critical region (while both setting the lock). But honestly, I have not seen this used in practice a whole lot and it is usually a sign that something is wrong with your synchronisation patterns anyways – remember, you want your critical regions to be as small and fast as possible, so it is usually not the hottest idea to be calling any functions in there!
But just in case you still need it or encounter code that already employs recursive locks I will say a few words about them: In addition to the normal locking functionality described above, recursive locks employ an internal counter to monitor, how many times they have been locked by the same thread (oh, did I forget to mention that? – It is possible to set the same lock multiple times from the same thread with recursive locks and that is the main difference between normal locks and them). The counter is also decreased every time the thread unsets the lock and only if the counter reaches zero, the lock is made available to the other threads again. Recursive locks can be combined with the mutex or the spinlock-variety, so e.g. recursive spinlocks are possible as well.
Timed Locks
There are times, when a thread wants to enter a critical region, but has other things to do if the region is presently locked. A very common way to deal with this situation is to test the lock beforehand. A very simple example (you guessed it, in OpenMP 😉 ) will make this clearer:
- while (!omp_test_lock (&my_lock)) {
- /* we did not get the lock!
- * -> therefore, do some other work
- * then try again... */
- }
- /* at this point in the code, omp_test_lock returned
- * successfully and has therefore acquired the lock
- * -> do something with the locked resource ...
- * ... and finally unlock it again */
- omp_unset_lock (&my_lock);
But actually, this is not what timed locking is about. An alternative to the way of dealing with the problem described above would be, to give the normal set_lock-function an additional timing-value (such as e.g. 200ms). The function will block normally, but if the lock is not acquired by that time, the function will return anyways and indicate somehow, that it has failed to acquire the lock. This can be useful e.g. in embedded systems, where it is required that certain timing constraints are obeyed (such as no thread shall wait more than 200ms on a lock, or something has gone wrong – although you would probably phrase that constraint a little different in your specification 😛 ). And in that case, a watchdog thread (or process) could be notified that makes sure that the system is working properly.
Timed locks are not very common (I think POSIX has them now), but still your threading system may have them and they are an important building block in certain niches (e.g. embedded systems). Theoretically, they can be combined with all the other variations suggested above and therefore a timed, recursive mutex is possible.
Hierarchical Locks
I first heard about hierarchical locks at Europar last week. This does not necessarily mean they are the newest invention in town (the first paper I could find on them dates back to 2002 – the locks are called RH-locks in there and it is a very good read if you want to know more implementation details of locks). It is way more likely that there is much more to locking than I know about or can describe in one article, especially since this is not my field of research :-). Anyways, hierarchical locks lock exactly like normal locks from the outside, their difference is in the implementation. I apologize for mixing in an implementation detail here, after all I have not told you about other such details, such as queue locks vs. backoff locks – because this is an article for programmers, not for compiler writers.
Nevertheless, I found the concept interesting enough to mention it here, especially after my last article about locality. Hierarchical locks operate under the premise that it is a good idea to let threads with a high memory locality (e.g. threads scheduled on the same node, working on the same data or data that are close together in memory) acquire the locks consecutively, as this will reduce cache misses. This sounds logical, although I do not have any experience whatsoever about it. The idea was invented for CC-NUMA machines and that’s where its strengths are. And the best thing about it is: the programmer does not need to care, as the locality improvements are managed automatically by the runtime system. I do not see why hierarchical looks cannot be combined with any of the locks described above, so maybe you are in the market for a hierarchical, timed, nested spinlock 😀 ?
Further Reading
If you want to know more about locking and thread-programming in general, I wholeheartedly recommend what I like to call the POSIX Threads Bible by David Butenhof – Programming with POSIX Threads. Or if you are more into Java, I hear Java Concurrency in Practice is a very good read. I have not really found a great book on Thread-Programming with C# – the best one appears to be Programming C#, although it contains only a single chapter on threaded programming. If you find a better one, please let me know.
Locking Summary
This article has grown so big that I need a summary for it ;-). I am inclined to promise anyone who has managed to read this far a beer, but since my blog audience has grown considerably during the last month (with about 150 regular subscribers and about 600 unique visitors a day) and this article will be on the net for a long time, this may not be such a great idea. Especially, since I don’t even drink beer. Therefore, I will promise to answer any on topic comments you will leave here, instead :P.
Anyways, the topic of this article was mutual exclusion with locks. After a not so short introduction to locking and why it is needed, I have showed some variations among them you may encounter when dealing with locking. First of all, I looked at spinlocks and mutex variables. The major difference between them is, that mutex variables put waiting threads to sleep, while with spinlocks waiting threads actively continue to try to acquire the lock.
The second variety I pointed out were recursive locks, which make it possible to set the same lock multiple times by the same thread. Next were timed locks, that do exactly that: they time out after a certain period of waiting time, returning control back to the calling thread. And finally, I looked at hierarchical locks, an implementation detail that promises better performance by exploiting locality, especially on bigger machines.
What I have not covered in this article are more implementation details (not that I know many more, as I said this field is not my research area), as well as more exotic concepts for mutual exclusion (lockless data-structures, anyone?). If there is sufficient interest, I might do that later.
Have I forgotten anything? Or did I get a detail wrong? Feel free to leave a comment!