Recently I wrote a blogpost showing the implementation of a simple bubble sort algorithm in pure R code. The downside of that implementation was that is was awfully slow. And by slow, I mean really slow, as in “a 100 element vector takes 7 seconds to sort”-slow. One of the major opportunities for a speed is to start using a compiled language. I chose to use C++ as this is really easy to integrate into R using the `Rcpp`

package. In addition to the `Rcpp`

package I use the `inline`

package which allows one to use C++ code and R code in a seamless fashion. The following code creates an R function `bubble_sort_cpp`

:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
require(inline) ## for cxxfunction() src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in); double tmp = 0; int no_swaps; while(true) { no_swaps = 0; for (int i = 0; i < vec.size()-1; ++i) { if(vec[i] > vec[i+1]) { no_swaps++; tmp = vec[i]; vec[i] = vec[i+1]; vec[i+1] = tmp; }; }; if(no_swaps == 0) break; }; return(vec);' bubble_sort_cpp = cxxfunction(signature(vec_in = "numeric"), body=src, plugin="Rcpp") |

Quite amazing how easy it is to integrate R code and C++ code. `inline`

compiles and links the C++ code on-the-fly, creating an R function that delivers the functionality. Of course the most important question is now how fast this is. I use the `microbenchmark`

package to run the bubble sort I implemented in pure R (here), the bubble sort implemented in C++ (see above), and the standard R sorting algorithm:

1 2 3 4 5 6 7 8 9 10 11 12 13 |
library(microbenchmark) vector_size = 100 print(microbenchmark(bubble_sort(sample(1:vector_size)), bubble_sort_cpp(sample(1:vector_size)), sort(sample(1:vector_size)))) expr min lq median bubble_sort(sample(1:vector_size)) 67397.546 74358.9495 78143.0710 bubble_sort_cpp(sample(1:vector_size)) 44.895 55.9340 60.4930 sort(sample(1:vector_size)) 44.173 48.1315 62.3785 uq max neval 81285.0215 105626.483 100 63.7715 74.643 100 67.2375 138.069 100 |

These results speak for itself, the C++ version is more than 1300 times faster when looking at the median speed, even faster than the built-in `sort`

function. These differences will only get more pronounced when the size of the vector grows.

I recently got a tip form Hadley Wickham through e-mail that pre-computing

`vec_in.size()`

once, and not every pass, makes a big difference in performance (x2). This is the new code:`base:::sort`

by default uses`shell sort`

which is`O(n^(4/3))`

. So, the difference may not be apparent on a vector of size 100 when comparing`Rcpp`

‘s`bubble sort`

to`base`

‘s. Since your vector is all positive and the range is < 1e5, you can also try out bases’s`counting sort`

(improperly named as`radix sort`

) as follows:`system.time(x[sort.list(x, method="radix")])`

(not to edit this post, but for trying it out yourself). Also, base’s`sort`

methods internally account for`NA`

s which hits performance.