Update: A Smart Way to Send a std::vector with MPI – and why it Fails
As many of you know, I always like learning from mistakes. From my own ones, as well as from other people’s. A couple of students of mine tried to send a C++ std::vector using MPI and I was skeptical if their way of doing things was correct. I am now convinced that although it probably works for most implementations today, what they did is a mistake. An interesting one, nevertheless and therefore I am posting it here.
They tried sending a vector like this (details omitted for clarity):
- MPI::COMM_WORLD.Send (&vec[0], N, MPI::INT, 0, 0);
And receiving it like this:
- vec.resize (N);
- MPI::COMM_WORLD.Recv (&vec[0], N, MPI::INT, 1, 0);
See the problem? Sending the vector like this is probably ok, because as far as I remember (and as far as my ever-helpful colleague and ueber-authority 🙂 on C++-issues in our research group Björn is concerned) C++ guarantees that the actual data are stored in a big block that can be accessed just like an array in a vector. And we are doing nothing else here (although beware that depending on your implementation, &vec[0] may be different from &vec – but that has been taken care of in this example).
Receiving the data is an entirely different matter. Apparently, the students knew that the size of a vector is commonly stored inside the object somewhere and that they therefore had to make sure it was the right size before filling it. And they did that correctly by calling resize(). But the problem is that I am fairly sure that there may be other bits of information stored inside a vector-object.
I came up with an example to make this clear: Let’s say my software needs to work in a hostile environment, e.g. space. From time to time, an x-ray would run through my transistors and flip a bit or two (not that I know much about x-rays or how they can affect transistors, but thats not the point). Therefore, the vector-object I am using has the same interface from the outside and conforms to the C++-standard, but has some sophisticated error-detection and recovery mechanism hidden inside. Maybe it is maintaining a hash-value over all elements. Or something like that. When the recv()-operation is called like shown above, the vector elements are not accessed through the correct interface, but rather through whichever mechanism MPI sees fit to use to fill an array. Maybe memcpy(). And therefore my error-correction mechanism will be all messed up after the operation, leaving the vector-object in an inconsistent state. And thats why this way of doing things is wrong.
The point I am trying to make here is: A vector is an object. Like any object, it may have hidden implementation details inside and therefore you must use its interface to change it, like any other object. Counting on object-internals like the fact that it probably has a big block of memory inside to store the actual data will not do, although it may work with your present implementation!
Wow, I really got into preaching mode here :o, sorry about that. And now I should probably really go and find me a copy of the C++-standard to validate all my theories against – and hope that I am able to understand the language they use in there… 😯
[Update]:Thanks a lot for all the comments. I have bought the C++-Standard because I wanted to make sure and I would like to add a few things:
- In the version of the 2003 standard I have here it says: “The elements of a vector are stored contiguously” – so that settles that question. This makes my last statement somewhat wrong, this is not an implementation detail exactly as I guessed at the beginning of my article.
- Nowhere in the standard (or at least nowhere that I could find – the dang thing is 786 pages long after all) does it say anything about other internal data stored in a vector object. And this makes the main point in the article still valid: if there is a vector implementation that saves more than the size of the vector internally (like the error correction code I mention in my article) it is not save to access the contents of the internal array in the way described. You have to go through the interface of the vector (which, incidentally will be no problem most of the time, because the operator[] is part of the interface). But things like memcpy() probably won’t do – although they will for an array. If anyone has anything substantial to say to counter this point, I would be really interested in your comments!
- Thanks for the helpful resources, especially this one from Herb Sutter was very enlightening.
- And last but not least: resize() is required, reserve() won’t do (because the size that is stored internally will be wrong if you use reserve()).
Thanks for all your comments, this was very enlightening!