Ten Questions with Sanjiv Shah about Parallel Programming and OpenMP
This is the third post in my Interviewing the Parallel Programming Idols-Series. My interview partner today is Sanjiv Shah, who I was lucky enough to meet at various OpenMP-workshops. I have come to know him as the most knowledgeable person about OpenMP and parallel programming ever :P. Let me add a little bit about his background. Sanjiv Shah is a Senior Principal Engineer in the Software and Solutions Group specializing in multi-threaded computing and the Director of the Performance, Analysis and Threading Lab at Intel. During his career, Sanjiv has worked on and managed many aspects of performance and correctness analysis tools, compilers and runtimes for parallel applications and systems. He has been extensively involved in the creation of the OpenMP specifications and of the industry standards organization known as the OpenMP Architecture Review Board. He is a former CEO of the OpenMP ARB and continues to serve on its Board of Directors. What a long list of titles :D. Besides that, he is also a really nice guy and a joy to talk to.
I think I have praised him enough for now, let’s start with the interview. The first five questions are about parallel programming in general:
Michael: As we are entering the many-core era, do you think parallel computing is finally going to be embraced by the mainstream? Or is this just another phase and soon the only people interested in parallel programming will be the members of the high performance community (once again) ?
Michael: From time to time a heated discussion evolves on the net regarding the issue of whether shared-memory programming or message passing is the superior way to do parallel programming. What is your opinion on this?
Which one is appropriate depends very highly on the application: shared memory makes some things very easy, whereas distributed memory makes other things easy. SETI@home is becoming the classical distributed example: very little shared state and millions of computers worldwide can run independently. Many such applications map naturally to distributed memory. On the other hand, when there is a lot of continuously changing state to be shared, shared memory programming makes a lot of sense. Many of the applications we use every day, like the editor/email system I am typing this in, word processors, web browsers, spreadsheet programs, video and image processing, computer games and so on depend heavily on shared state.
Michael: From your point of view, what are the most exciting developments /innovations regarding parallel programming going on presently or during the last few years?
OpenMP is a very nice standardization effort that is now widely available to programmers with implementations available in EVERY major compiler. There is some good working going on with regard to tasking in OpenMP which will make OpenMP much more accessible to C++ programmers.
Threading Building Blocks is another nice way to represent parallelism. It is a parallel “language” embedded in a C++ template library for control and data parallelism and provides concurrent versions of some of the more commonly used data structures. It is a nice capture of the state of the art in an easily usable form.
Michael: Where do you see the future of parallel programming? Are there any “silver bullets” on the horizon?
Look at it this way: with 2 cores, a sequential application is not taking advantage of 50% of the available computing power. With 4 cores, 75%. With 8 cores, 87.5%. With 16, 93.75%. Around 4-8 cores, sequential applications will be ignoring too much of the available power to continue to compete and survive.
And ultimately, that is the “silver bullet”. Need. Humans adapt infinitely in order to survive. Programmers will adapt to be very adept at parallel programming.
Michael: One of the most pressing issues presently appears to be that parallel programming is still harder and less productive than its sequential counterpart. Is there any way to change this in your opinion?
4 and 8 core processors bring the need, University curricula are starting to change, parallel languages are becoming widely available and tools are starting to become available for some of the more common languages.
However, I’d like to point out that sequential programming will likely always be easier than parallel programming because the environment is more constrained.
So much for the first part of this interview. Without further ado, here are the questions about OpenMP:
Michael: What are the specific strengths and weaknesses of OpenMP as compared to other parallel programming systems? Where would you like it to improve?
A weakness of OpenMP is that it is trying to serve too broad a market. On one hand, you have HPC experts trying to squeeze every FLOP on large systems because of system cost and on the other, millions of programmers happy with relatively small gains on modest sized systems that are virtually free. In catering to both, we may end up catering to neither. The expert wants total control of where threads are running and what they are doing. The ordinary user is blissfully ignorant. Thread id’s are another specific example of the dichotomy – I wish we could do away with them.
The language needs to improve in its expressive power, in its error handling, in coexisting with other threading models, for C++ support. The current OpenMP library has only the very basics necessary – programmers need to build upon these basics to get anything done. The library should be usable out of the box without having to build upon it.
Michael: If you could start again from scratch in designing OpenMP, what would you do differently?
Michael: Are there any specific tools you would like to recommend to people who want to program in OpenMP? IDEs? Editors? Debuggers? Profilers? Correctness Tools? Any others?
The multiple run comparison feature of Thread Profiler for OpenMP (GuideView) is also very powerful for performance tuning. It lets you dive to individual parallel regions and the sequential regions between parallel regions and understand the scaling and non-scaling behavior at this level. There are other performance profilers out there from Bernd Mohr, Al Malony and others that have also become quite good at OpenMP.
Michael: Please share a little advice on how to get started for programmers new to OpenMP! How would you start? Any books? Tutorials? Resources on the internet? And where is the best place to ask questions about it?
Look at how Thread Checker can be used to add parallelism very quickly. I follow a very simple recipe:
- Get sequential program correct.
- Identify loops I want to parallelize via profiling.
- Eyeball the loop to identify obvious private objects and use as much language specific scoping (or the private clause for Fortran).
- Use parallel for and Thread Checker to get a worklist of remaining work!
Its that simple. People including myself have parallelized million like apps using this simple recipe.
Michael: What is the worst mistake you have ever encountered in an OpenMP program?
I would be remiss if I didn’t point out that sometimes it is very important to parallelize such loops to get or preserve side effects. For example, on NUMA systems, people parallelize such loops for the memory allocation side effect (due to first touch allocation policy). And when data is already spread out among different threads, it pays to keep the data spread by paralleling fine grained loops, even if it costs a little overhead.
Michael: Thank you very much for the interview!