Parallel processing
A colleague of mine once described parallel processing as the "work of the devil". I don't know if I'd go quite this far — this statement was made in the early nineties, when technology was that little bit less advanced than it is today. It is undeniable however that dividing work efficiently, robustly and scalably over multiple processors has always been, and remains, a non-trivial task.
Parallel processing in a nutshell means performing program steps simultaneously instead of one after another (which is serial processing). To do this of course requires a platform with multiple CPUs, for which we have two major options:
- Clustering — presenting one or more services from a collection of loosely coupled independent systems. Sometimes referred to as a server farm or on a larger scale, a grid. Most of us use large clusters every day, thanks to providers like Google. Think of this architecture like a supermarket with multiple outlets. You can process more customers by adding more outlets.
- SMP — or symmetric multi-processing — is the division of work across multiple CPUs/cores inside a single system. Think of this as a supermarket with multiple checkouts. You can process more customers by adding more checkouts in a single store.
Clustering works well when the distributed work segments can process independently on a single node, minimising crosstalk between nodes. It doesn't only offer potential performance increases, but as the nodes of clustered systems are independent, you have opportunities to build in failover / fault-tolerance (in fact, as you have more "moving parts", you really do need to think about this, and don't expect designing it to be straightforward). Increasing capacity in a clustered system seems a "simple" matter of adding more computers, but with non-optimal workloads the networking overheads can be punitive. To labour our supermarket metaphor, imagine your weekly shop involved visiting ten aisles in ten different outlets. Travel overheads destroy any performance benefit.
Symmetric multi-processing allows program steps to cooperate via common access to memory, so there are no networking overheads. Consequently a much wider variety of programming workloads can benefit from improved performance using SMP architectures. These systems don't scale indefinitely as there is a limit to the number of processors that can be closely-coupled this way. Happily though, single processor performance nowadays means even a modest number of CPUs brought to bear on a well-crafted problem can deliver impressive benefits.
These architectures are not exclusive — it is possible to join SMP machines in a cluster, and NUMA (non-uniform memory access) offers a kind of clustering inside a single machine cabinet. We've also only been talking about general purpose MIMD (Multiple Instructions / Multiple Data) machines, not specialist SIMD (Single Instruction / Multiple Data) machines. Neither have we considered if our MIMD CPU cores are full-powered or share critical components like FPUs (floating point units) causing bottlenecks for calculation heavy workloads. Come to think of it, maybe there is the whiff of sulphur about this topic after all...
Add new comment