Recursive locks – a blessing or a curse?
Christopher Smith has tried very hard to come up with a case where a recursive lock is a good thing (TM). And while the article is very well thought out and presented, apparently David Butenhof (the creator of my favorite book on POSIX Threads) disagrees. Actually, it was the other way around, as Butenhofs article is way older. But anyways, as I have voiced my own opinion on the subject in this article already (executive summary of my opinion on recursive locks: I have not seen this used in practice a whole lot and it is usually a sign that something is wrong with your synchronisation patterns anyways), I wanted to comment on some of the points Christopher makes.
But first of all, let me add a word of warning: Christopher uses databases for his examples and I don’t work with these on a daily basis. I know the basic concepts, have worked with them on some web-based projects, but that’s about it. And since I have more of an HPC-background, some of the things he suggests strike me as kind of odd, but that may be because locks are just used differently in the world of databases. Anyways, enough with the preface, let’s get on with the article:
First of all, he constructs a use-case for recursive locks, including encapsulation, a really smart logging framework, and a rarely used lock. I thought about whether or not to quote his whole example here, but decided against it because it is long. Really long. Therefore, for the rest of this article I am going to assume you have read and understood it. Here are the points that bother me about it:
- Is it really a smart idea to log into the same database, where the actual transactions are carried out as well? What happens if the database fails? Aren’t the logs there to provide a way to safeguard against loosing data in this case?
- Does the logging really need to happen inside the transaction? Christopher discusses this point himself, and reasons that this is the only way to capture the context of the transaction. I don’t know enough about databases to knowledgeably discuss this point, but I know I would normally try to avoid this. I would most likely update the log afterwards or before, even if it involves copying some data. Of course, Christopher also tries to counter this argument by saying that there is little contention on the lock- but still the whole example looks a little – set up.
- Is it a smart move to pay the performance penalty associated with recursive locks all the time, only because there may be the chance that logging is enabled and the logging framework uses the same database as its back-end? Wouldn’t it be smarter to guard against this case only if it is really needed? Sure, this may make the interface of the logger potentially uglier, because you have to provide non-locking logging operations as well and this is an implementation detail that you wanted to hide, but at least you can restructure the code so that the locking operations are merely wrappers around them and not much code duplication is involved.
These are just some of the points that came to my mind when thinking about the problem. I also tried to think of ways to solve it nicely without the use of recursive locks, but decided that it was not trivial and with my limited knowledge about how databases are used in practice I was not likely to come up with a really great solution. Especially, since it is pretty clear from the length of the problem description that it is cleverly constructed and well presented – but still an edge-case.
I do want to comment on the other examples he uses to justify recursive locking, though:
the probability that the lock will be used recursively is small (say you have to lock two records in a table that has page-level locking)
With this example I disagree strongly. In my humble opinion, an algorithm to lock two records in a table that has page-level locking should go something like this:
- set the lock for the record with the smaller primary key – always locking the smaller one first avoids deadlocks, when another thread tries to lock them the other way around
- only set the lock for the second record if it is not on the same page
Not an overly complicated algorithm, solves the problem nicely and no recursive locking needed. It is therefore made sure that after the unset-function for the lock is called once, the lock is actually available again (which in turn avoids a potential source of errors).
the lock explicitly exists as a trade off between concurrency and resources like CPU, memory and IO (caching for example).
I must be being dense, but I don’t get this. I don’t even know what he means. Maybe it is our different backgrounds again, maybe I am just to stupid, but I cannot imagine how a lock or a recursive lock can be useful in that context. 🙁
To close up this post, let me add a little background. What really convinced me recursive locks are a bad idea at the time was the whole thread, where Butenhofs post originated. It takes some time to read the whole thing (more than 100 posts), but if you have understood them all you have gotten to know way more reasons against recursive locking than you probably ever wanted to 🙂 (and it also saves me from having to sum them all up here, as people that are more knowledgeable about the topic have done it there already). It also contains some very entertaining posts and a very good reason to stay away from Siemens mobile phones (you will understand what I mean if you actually read to the end of the thread ;)).
If you have been reading this blog for some time, you know that I like quotes and once again I do want to close this post with one from the thread I just mentioned:
Don’t use recursive mutexes. It’s akin to sex with used condoms. 😉
I have nothing to add to that 8O.