Please Don’t Rely on Memory Barriers for Synchronization!
I was innocently browsing the net, when I found this on reddit. It’s an article about synchronization using memory barriers. And it made me sad (I was going to write: it almost made me cry, but I figured that would be a little exaggerated). This article is a warning and it tells you the reasons why you please should not use the techniques described there, unless you really, really know what you are doing!
I usually don’t do this, but I would like to step through this article a bit with you to make it clear why it bugs me so much. But before I do, let me make one thing perfectly clear: I don’t know the author of the article. I hear he makes a fine hex editor for Mac OSX. This is not in any way meant to be a personal attack on him or his views, I just think this particular post is in dire need of warnings, clarifications and editorial changes, because the conclusions he draws are dangerous. It is an interesting article for people with experience in multithreading. I believe it to be dangerous for beginners to the topic.
It starts with the definition of terms:
Threads are just preemptively scheduled contexts of execution that share an address space.
[snip]
Multithreading is the physically simultaneous execution of multiple threads for increased performance, which requires a dualie or more.
Threads are not necessarily preemptively scheduled. Multithreading has nothing to do with simultaneous execution. There is a nice and easy to find article in the Wikipedia that explains the terms correctly, so I don’t have to. Or in any good book on any parallel programming system using threads.
Yeah, I know. “Multithreading is hard” is a cliché, and it bugs me, because it is not some truism describing a fundamental property of nature, but it’s something we did. We made multithreading hard because we optimized so heavily for the single threaded case.
This may be one of the reasons. Multithreading is considered hard, because it requires the programmer to keep multiple execution paths in his head simultaneously. Multithreading is considered hard, because threads share state and therefore can potentially change each others values in memory at any time. Multithreading is considered hard, because the debuggers and profilers are not as mature as the ones for the single-threaded case. And there are a multitude of other reasons. None of them are show-stoppers for me. None of them keep my students from writing correct programs after they have had some practice and guidance (but thats a topic for another post). But please, please, don’t make it look like the only thing to be aware of when dealing with threads is instruction reordering by the compiler!
Let’s face it: until here, everything I have said can be considered hair-splitting. Nothing that a few well-placed edits in the article cannot fix. I am not going to go through each of his implementations, but will jump a little further down instead into one of his explanatory boxes, where it starts to get interesting again:
What’s the usual response to cross-processor synchronization issues like this? A mutex! Let’s try it.
fish ) ./a.out
0 failures (0.0 percent of the time) in 479.5 secondsIt made the tests pass, all right – but it was 130 times slower! A spinlock does substantially better, at 20 seconds, but that’s still 440% worse than no locking – and spinlocks won’t scale. Surely we can do better.
Ouch. Until this part of the article, everything can be considered toying around with threads. But now it is getting dangerous. As people are reading this, this point will stick: locks are slow and should be avoided. And this is so wrong! Locks are the only safe and portable solution to the toy-problem described in this article. Let me repeat: there is no other way when programming with POSIX Threads to solve this problem, except with locks. Unfortunately, for this solution there is not even code in the article. Every other solution relies on architecture specifics (that can change anytime the processor vendor feels like it!) and should not be employed. I am sure the original author wanted to improve on that solution, but to the reader, this leaves a very wrong impression (more on that later). Is the solution with locks slower? Sure it is, but in a real program, that does real work, the percentage of time spent in the locks will go down and even though locks may be slower than toying with memory barriers, you may not even notice the difference then. And even if you do: there are ways to reduce the need for locking and if that’s not enough, there are lock-free libraries available, but please don’t start to play with memory barriers and the like until you are sure you actually have a problem with synchronization speed!
I have really gotten into preaching mode here, I am sorry about that, but this is really, really touching me. The methodology shown here is so wrong as well. It is obvious that the author has not really dug deep into POSIX threads, the system he is using (if he did, he would have know pthread_join() or pthread_exit() – and would not have needed the hack he used to keep the main thread alive). When he saw a problem with synchronization, he chose to dig into the architecture-description of his Mac instead to rely on unportable, unsafe hacks to fulfill his one testcase. Why not read a good book on POSIX threads instead? In Programming with POSIX Threads (the bible when it comes to POSIX Threads) he would not have found advice like his. Instead, he would have read to always make sure to always create a correct program first, then optimize later when the need arises. A point that is very important to understand for beginners.
Messing around with memory barriers is hard. It is so hard that it is e.g. considered a mistake by many members of the OpenMP language-committee to have exposed them in the OpenMP-API via the flush()-directive. I found the article I am talking about here on the frontpage of programming.reddit.com, where it was read and voted up by hundreds of programmers, many of whom have only a vague idea about what programming with threads means. What will they learn from this article? Locks are slow and should be avoided? Memory barriers are a good and fast way to guarantee synchronization? When in doubt, use volatile? Double-checked locking is a good idea (it’s not. never. ever.)? Each of which are fatal misconceptions, if you ask me – at least for beginners to threads-programming. When these programmers go out and do their first project with threads and try to program using the advice given, they will undoubtedly produce a Mount Everest of errors. And they will try to debug them and are most likely not going to make it. The project is then canceled and the next time somebody asks them, they will happily tell everyone that Threading is just too hard to get right.
Am I exaggerating a little? Sure I am. But I am also sick of hearing this over and over again. It is possible to program with threads, if you stick to the tried and known rules. People have been doing it for ages in the field of High Performance Computing. Sure it is harder than normal programming, but it’s no rocket science either.
This post sounds harsh on the original author of the ridiculousfish-article. It’s not meant to be so. Unfortunately, what began as an interesting experiment on his side (and thats what it is – an interesting experiment) – has ended with all the wrong conclusions. Conclusions that are dangerous and cannot be drawn from it, if you ask me. And thats why I feel the need to speak up here. If nothing else, please, at least add big, fat warning signs to your post that it is not intended for beginners! Right now, it sounds like a nice introduction to threading. But it is far from that, it’s material that should only be considered by advanced programmers in the field of concurrent programming who need every bit of performance they can get on their specific architecture. Memory Barriers are no toys for inexperienced programmers and should be used as a last resort only! Sorry for the rant, but I just had to get this off my chest…