Parallel processing with short jobs only increases the run time

Parallel processing has become much more important over the years as multi-core processors have become common place. From version 02.14 onwards, parallel processing has become part of the standard R installation in the form of the parallel package. This package makes parallel makes running parallel jobs as easy as creating a function that runs a job, and calling parSapply on a list of inputs to this function.

Of course, parallelisation incurs some overhead: information needs to be distributed over the nodes, and the result from each node needs to be collected and aggregated into the resulting object. This overhead is one of the main reasons why in certain cases parallel processing takes longer than sequential processing, see for example this StackOverflow question.

In this post I explore the influence of the time a single job takes on the total performance of parallel processing compared to sequential processing. To simulate a job, I simply use the R function Sys.sleep. The problem that I solve is simply waiting for a second. By cutting this second up into increasingly small pieces, the size of each job becomes shorter and shorter. By comparing the run-time of calling Sys.sleep sequentially and in parallel, I can investigate the relation between the temporal size of a job and the performance of parallel processing.

The following figure shows the results of my experiment (the R code is listed at the end of the blogpost):

The x-axis shows the run-time of an individual job in msecs, the y-axis shows the factor between parallel and sequential processing (> 1 means parallel is faster), and the color shows the result for 4 and 8 cores. The dots are runs comparing parallel and sequential processing (20 repetitions), the lines shows the median value for the 20 repetitions.

The most striking feature is that shorter jobs decrease the effectiveness of parallel processing, from around 0.011 msecs parallel processing becomes slower than sequential processing. From that moment, the overhead of parallelisation is bigger than the gain. In addition, above 0.011 msecs, parallel processing might be faster, but it is a far cry from the 4-8 fold increase in performance one would naively expect. Finally, for the job sizes in the figure, increasing the number of cores only marginally improves performance.

In conclusion, when individual jobs are short, parallelisation is going to have a small impact on performance, or even decrease performance. Keep this in the back of your mind when trying to run your code in parallel.

Source code needed to perform the experiment:

Tagged with: , ,
Posted in R stuff
4 Comments » for Parallel processing with short jobs only increases the run time
  1. Vaidotas Zemlys says:

    Shouldn’t the factor (y axis value) rise to be 4 for 4 cores and 8 for 8 cores with increased time? Did you try increasing time to see whether this theoretical upper limit can be achieved?

  2. Paul Hiemstra says:

    The upper limit should indeed be around 4 and 8 respectively, but I’d expect a value of 3.8 and 7.8 to be more realistic taking into account some overhead of the paralellisation. The point of the post was to explore the region around a factor of 1, as that is where the run time of parallel processing is close to that of sequential processing. Therefore, I did not look at very large runtimes (1/4 and 1/8 second respectively).

  3. Vaidotas Zemlys says:

    Ah, ok. I think it is interesting to know, when the factor is close to its maximum value, since only then I personally would use parallelisation. For me paralelisation still requires mental adjustment (although it is relatively painless with par[LS]apply functions), so I use it only when it is sorely needed.

    • Paul Hiemstra says:

      I have used R to run computer models in parallel (using system), these runs took a few hours each. Running 8 models in parallel took me very close to the speed increase factor of 8. So, practically, when your jobs are large enough and can easily be run separately (no interactions between jobs needed), parallel processing makes sense.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

*