You're in all Blogs Section

Upgrade OS X Leopard to Mountain Lion using a clean install

My wife’s laptop (2009 Alum. MacBook) was still running OS X Leopard, and the lack of performance and support was getting pretty annoying. Finally, I took the effort to install the latest Apple OS on her laptop. When running Leopard, there is on complication: Mountain Lion can only be bought in the App Store, and Leopard has no App Store.

The official Apple way of solving this is to first upgrade to Snow Leopard, and then use the App Store to buy and install Mountain Lion (ML). I did not want to do this for two reasons: I wanted a clean install, and I did not want to first upgrade to Snow Leopard. The following steps describe what I did to do a single step, clean install of Mountain Lion. Do note that this howto requires another Apple computer with Lion/Mountain Lion already installed, I will refer to this as the other machine.

  1. Backup you data before following these steps as you will wipe the Leopard device’s hard drive. I would recommend not using a full backup such as Time Machine, but copy your homedrive to an external drive. A full backup will also backup and restore all kinds of old Leopard stuff. With a backup of your homedrive, you can then selectively copy back data such as photo’s.
  2. Acquire Mountain Lion. If you already have ML, and the device you want to install it on is going to use your Apple ID, you do not need to rebuy ML. If, as in my case, the other computer is going to run under another Apple ID, you need to take the following steps:
    1. Open the App store on the other machine,
    2. log in with the Apple ID for which you want to buy ML
    3. buy ML. Note that ML will start downloading immediately, you can cancel this right now.
  3. Create a recovery USB drive using the Recovery Disk Assistant that Apply provides. This needs at least 1GB of space.
  4. Insert the USB drive into your Leopard device, and restart. During the boot sequence, press the Option key (alt) to gain access to the boot menu. Select the USB drive as boot device.
  5. Once you’ve booted into the Recovery Assistant, go to the Disk Utility and wipe the boot partition (not the hard drive itself) on your hard drive.
  6. Connect to your WiFi network or insert your network cable.
  7. Go back to the main menu, and select the option to install ML. You will be asked to enter an Apple ID, enter the one relevant for this old Leopard device. This was my wife’s ID in my case. The Recovery Assistant will now download ML (can take some time, as it is 4.4 GB), and run through the installation procedure. This took around 3 hours for me. The machine will automatically reboot, and you will enter Mountain Lion heaven…

Notes:

– Do note that iLife with not be available after installing ML from scratch. You can, however, install iLife from the original installation DVD’s you got with your Apple device. You will get the old iLife version, the new, updated ones need to be purchased in the App Store if you follow this guide.

Tagged with: , , ,
Posted in MacBook

Much more efficient bubble sort in R using the Rcpp and inline packages

Recently I wrote a blogpost showing the implementation of a simple bubble sort algorithm in pure R code. The downside of that implementation was that is was awfully slow. And by slow, I mean really slow, as in “a 100 element vector takes 7 seconds to sort”-slow. One of the major opportunities for a speed is to start using a compiled language. I chose to use C++ as this is really easy to integrate into R using the Rcpp package. In addition to the Rcpp package I use the inline package which allows one to use C++ code and R code in a seamless fashion. The following code creates an R function bubble_sort_cpp:

Quite amazing how easy it is to integrate R code and C++ code. inline compiles and links the C++ code on-the-fly, creating an R function that delivers the functionality. Of course the most important question is now how fast this is. I use the microbenchmark package to run the bubble sort I implemented in pure R (here), the bubble sort implemented in C++ (see above), and the standard R sorting algorithm:

These results speak for itself, the C++ version is more than 1300 times faster when looking at the median speed, even faster than the built-in sort function. These differences will only get more pronounced when the size of the vector grows.

Tagged with: , , ,
Posted in R stuff

Bubble sort implemented in pure R

Please note that this is programming I purely did for the learning experience. The pure R bubble sort implemented in this post is veeeeery slow for two reasons:

  1. Interpreted code with lots of iteration is very slow.
  2. Bubble sort is one of the slowest sorting algorithms (O(N^2))

The bubble sort sorting algorithm works by iterating over the unsorted vector and comparing pairs of numbers. Let’s say the first point pair is c(61, 3), here the numbers need to be swapped as the 3 should be earlier in the sorted vector. The following function returns TRUE if the numbers should be swapped, and returns FALSE otherwise:

This function is used by the following function:

which returns the swapped version of the pair if appropriate, or the original pair if the order is ok. For each point pair (element1-element2, element2-element3, etc) swap_if_larger is called:

One pass of this function performs a comparison on all pairs, swapping if necessary. To fully sort the vector, we need to perform multiple passes until no swaps are needed anymore. I chose to implement this using recursion:

The function starts by perform a swapping pass over the vector. If the new vector is equal to the old vector, no swaps where needed, i.e. the vector is already sorted. The function than returns the vector. Alternatively, if the vectors are different, the vector is not yet fully sorted, and we need to perform more passes. This is accomplished by recursively calling bubble_sort again on the vector. An example of the function in action:

The full sorting process is nicely illustrated by the following animated gif (linked from wikipedia):

enter image description here

This implementation is horribly slow:

Probably implementing the relatively slow bubble sort in a compiled language pose a dramatic increase in speed. Maybe a nice first testcase for Rcpp

Tagged with: , , ,
Posted in R stuff

Ctags support for IDL: regular expression definitons

One of the reasons I switched to using Vim as a text editor is the excellent supports for Ctags. In a nutshell, ctags allows you to put your cursor on a function name, press C-p, and jump to the file where that function is defined. Ctags supports a great number of programming languages, but unfortunately, IDL is not one of them. Luckily it is straightforward to add support for new languages. Simply add this:

to your ~/.ctags file to be able to use Ctags with IDL.

Tagged with: ,
Posted in IDL

Indexing IDL matrices with vectors: some unexpected behavior for out of range values

IDL is not the most used language around, many of you might not even have heard of it. Mind you that by IDL I mean the Interactive Data Language, and not the Interface Description Language, which many more people know. IDL is stil used a lot in some scientific applications, for example in astronomy and remote sensing. IDL has some quirks which might catch you off guard. I just encountered one of those, which I would like to share. Note that this is valid for version 7 of IDL, which is not the latest one.

Assume we have the following matrix:

We can index this matrix in the following manner, observe that IDL indexing starts at zero:

Providing an invalid index nicely leads to an exception:

IDL is also vectorized, so we can pass vectors of indices to extract multiple values in one go:

But now comes the problem. Let’s pass some out of range indices as vectors:

In stead of throwing an exception, IDL happily returns the last valid value in that dimension, or in this case a pair of dimensions, i.e. 23. So, beware, when passing vectors of indices to an IDL array there is no out of range checking: you are on your own :).

Tagged with: , ,
Posted in IDL

Parsing complex text files using regular expressions and vectorization

When text data is in a nice CSV format, read.csv is enough to parse it into a useable format. But if this is not the case, getting the data into a useable format is not so straightforward. In this post I particularly illustrate the use of regular expressions for complex and flexible text processing, and the power of vectorization in R. Vectorization means that we operate on vectors as a whole, not operate on individual elements of a vector.

Take for example a snippet of this data which I downloaded from StackOverflow:

The entire data file can be downloaded here.

In this post I’ll be stepping through the R code needed to get this text data into a useable format. First, we want to read the data into a character vector:

where each element of the vector is a line in the the text file. Already we see that the first line is some header information which we want to skip:

note the use of negative indexing to remove an element. Next we want to find all the elements in the vector that relate to the date for which the data is representative, we do that by using a regular expression which looks for lines that start with -:

and find the amount of actions, upvotes or downvotes etc, that have taken place on each day, i.e. the index of a certain day minus the index of the day before that:

note that we add rep_date_entries[1] because diff cuts off the first element. Now that we know which elements relate to the date, we can read all other lines into a nice data.frame:

The reputation column has a somewhat strange format ((10)), we need to get rid of the brackets. A nice way of doing that is using a regular expression, and the str_extract function from the stringr package:

The regular expression [0-9]+ matches one or more numbers, and str_extract gets the number out of the string. Now we have the data, we need to add a column which says for each row to which date it belongs. We know which lines in the data belong to a date (rep_date_entries) and we know how much data entries there are per day (actions_per_day). We can now simply repeat each element in rep_date_entries as many times as there are actions:

You can see that the date is not yet in a nice format, we need to get rid of all the text, except the date itself. Again, we can use a regular expression, combined with str_extract for this:

The regular expression "[0-9]{4}-[0-9]{2}-[0-9]{2}" matches any occurence of 4 numbers-2 numbers-2 numbers. Finally, we transform the date from a string to a real date object using strptime:

The end result is the following data.frame:

All this code together leads to the following function:

I think this nicely illustrates the power of both vectorization, very short and to-the-point for-loop-less syntax, and regular expressions in editing strings.

Tagged with: , ,
Posted in R stuff

Slate: an XMonad like windowmanager for Mac OS

For work I recently switched to using Mac OS (MacBook Pro 15” retina), until then I had been using Linux. The switch was rather painless, as a lot of the unix goodness (terminal!) is also present on the Mac. One thing that I was missing from my Linux days is XMonad. XMonad is a tiled window manager, which is very lightweight and heavily depends on shortcuts. But then a colleague suggested I take a look at Slate. Slate has a lot of the functionality and configurability of XMonad, and is a very nice addition to my already shortcut centered way of working on my MacBook.

Slate allows you to:

  • Attach keystrokes to how you want to manage your windows: e.g. resizing, moving around the screen, shifting focus between apps.
  • Create app layouts, e.g. Google Chrome full screen, Mail Client on my 2nd screen maximized to the left half of the screen, Terminal maximized to the right half of the screen, and being able to quickly switch between them. It also allows you to create layouts for 1 and 2 monitor setups, and switches between them automatically.
  • Windows Hints, press a button and all the apps are marked by a letter. Pushing that letter shifts focus to that app.
  • An alternative app switcher.

Configuring Slate is done using a .slate file in your homedirectory, and can be a bit daunting. But if you are used to working with .bash_profile‘s and such, you’ll feel right at home. I’ve only just starting to work with Slate, but you can have a look at my config file at the bottom of this post. A nice introductory blog post for Slate has been written by Tristan Hume.

One issue for me right now is that Slate does not work well with Mission Control (multiple workspaces). The features I use in the config file below work fine, but for example the layout’s aren’t able to use multiple workspaces. This is a known issue, and progress in this direction is hampered by the lack of API support from apple.

Tagged with: ,
Posted in MacBook

Automatic spatial interpolation with R: the automap package

In case of continuously collected data, e.g. observations from a monitoring network, spatial interpolation of this data cannot be done manually. Instead, the interpolation should be done automatically. To achieve this goal, I developed the automap package. automap builds on top of the excellent gstat package, and provides automatic spatial interpolation, more specifically, automatic kriging. Kriging in its more simple form (Ordinary Kriging, Universal Kriging, aka Kriging with External Drift) is actually nothing more than linear regression with spatially correlated residuals.

automap provides the following set of functions (for details I refer to the online manual):

  • autofitVariogram, automatically fits the variogram model to the data.
  • autoKrige, automatically fits the variogram model using autofitVariogram, and creates an interpolated map.
  • autoKrige.cv, automatically fits the variogram model using autofitVariogram, and performs cross-validation. Uses krige.cv under the hood.
  • compare.cv, allows comparison of the output of autoKrige.cv and krige.cv. This can be used to evaluate the performance of different interpolation algorithms. compare.cv allows comparison using both summary statistics and spatial plots.

In general, the interface of automap mimics that of gstat. The following code snippets show some examples of creating interpolated maps using automap:

You can get automap from either CRAN:

or my bitbucket account.

PS: automap was the first package I wrote, at the beginning of my PhD, so it is not the most beautiful code I ever wrote ;).

Tagged with: ,
Posted in R stuff

Implementing a lookup table in Fortran 90: Part 2

In my last post I proposed a simple implementation for a lookup table in Fortran 90 using a module. I extended the lookup table to make it more usable in a realistic setting. Do note that in some aspects, this implementation of a lookup table is not very efficient. This is mainly in the ability to quickly add and retrieve elements form the hash table. My implementation uses a linear search to find the key, this could be done much more efficiently using e.g. a binary search. I made the following changes:
* There is now just one subroutine to put stuff into the hash table: hash_set. When hash_set is called, the hash table tries to find the key, and if it is not found, the key-value pair is pushed onto the hash table. hash_push is no longer publicly available.
* Standard behavior now is that the hash table has a size of 50 items, and the table no longer has to be initialized at a certain size. In addition, when the 50 numbers are full, hash_reallocate is called to extend the hash table by another 50 items. This makes the hash table much more flexible.
* hash_print now only prints the key-value pairs that are actually used. The following program shows the new hash table module in action:

A nice addition would be add an index to the hash table, allowing one to have several hash tables inside the one hash table module. Any call to a subroutine would then also require specifying which hash table one needs to change. The source code for the module is given here:

Tagged with: ,
Posted in Fortran

Implementing a simple lookup table in Fortran 90

Implementing a lookup table, showing behavior similar to a dictionary as it is called in Python, in Fortran 90 turned out to be non-trivial. For starters, no standard data type was available (afaik) in Fortran 90. I decided to implement one myself, custom for my situation. What I needed was a mapping from a model name to a REAL value, storing key-value pairs. The lookup table would enable me to take the model name, and retrieve the associated REAL value. The following module implements the lookup table. It has a number of subroutines: * hash_init(no_items), initializes the hash table with the correct number of key-value pairs. * hash_push(key, value), pushes the key-value pair to the next available place in the hash table, i.e. first index 1, next index 2, etc. * hash_get(key, value), retrieves the value for a given key. Errors are raised when the hash table is not yet fully filled, or when the key has not been found. * hash_set(key, value), allows the user to change the value of a given key-value pair. Errors are raised when the lookup table is not yet fully filled, or when the key has not been found. * hash_print(), prints the current contents of the hash table to the screen. The following program illustrates the use of the hash module (tested using ifort and gfortran). Note that the method names contain a reference to a hash, but this code represents a lookuptable without any of the speed of a hash map.

The code of the module

hash is given here:

Tagged with: , ,
Posted in Fortran