<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>NumberTheory</title>
	<atom:link href="http://www.numbertheory.nl/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.numbertheory.nl</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Sun, 16 Jun 2013 13:17:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Upgrade OS X Leopard to Mountain Lion using a clean install</title>
		<link>http://www.numbertheory.nl/2013/06/16/upgrade-os-x-leopard-to-mountain-lion-using-a-clean-install/</link>
		<comments>http://www.numbertheory.nl/2013/06/16/upgrade-os-x-leopard-to-mountain-lion-using-a-clean-install/#comments</comments>
		<pubDate>Sun, 16 Jun 2013 10:08:40 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[MacBook]]></category>
		<category><![CDATA[leopard]]></category>
		<category><![CDATA[mac-os-x]]></category>
		<category><![CDATA[mountain lion]]></category>
		<category><![CDATA[upgrade]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=578</guid>
		<description><![CDATA[My wife&#8217;s laptop (2009 Alum. MacBook) was still running OS X Leopard, and the lack of performance and support was getting pretty annoying. Finally, I took the effort to install the latest Apple OS on her laptop. When running Leopard,<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/06/16/upgrade-os-x-leopard-to-mountain-lion-using-a-clean-install/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>My wife&#8217;s laptop (2009 Alum. MacBook) was still running OS X Leopard, and the lack of performance and support was getting pretty annoying. Finally, I took the effort to install the latest Apple OS on her laptop. When running Leopard, there is on complication: Mountain Lion can only be bought in the App Store, and Leopard has no App Store.</p>
<p>The official Apple way of solving this is to first upgrade to Snow Leopard, and then use the App Store to buy and install Mountain Lion (ML). I did not want to do this for two reasons: I wanted a clean install, and I did not want to first upgrade to Snow Leopard. The following steps describe what I did to do a single step, clean install of Mountain Lion. Do note that this howto requires another Apple computer with Lion/Mountain Lion already installed, I will refer to this as <em>the other machine</em>.</p>
<ol>
<li>Backup you data before following these steps as you will wipe the Leopard device&#8217;s hard drive. I would recommend not using a full backup such as Time Machine, but copy your homedrive to an external drive. A full backup will also backup and restore all kinds of old Leopard stuff. With a backup of your homedrive, you can then selectively copy back data such as photo&#8217;s.</li>
<li>Acquire Mountain Lion. If you already have ML, and the device you want to install it on is going to use your Apple ID, you do not need to rebuy ML. If, as in my case, the other computer is going to run under another Apple ID, you need to take the following steps:
<ol>
<li>Open the App store on the other machine, </li>
<li>log in with the Apple ID for which you want to buy ML</li>
<li>buy ML. Note that ML will start downloading immediately, you can cancel this right now.</li>
</ol>
</li>
<li>Create a recovery USB drive using the <a href="http://support.apple.com/kb/DL1433">Recovery Disk Assistant</a> that Apply provides. This needs at least 1GB of space.</li>
<li>Insert the USB drive into your Leopard device, and restart. During the boot sequence, press the Option key (alt) to gain access to the boot menu. Select the USB drive as boot device.</li>
<li>Once you&#8217;ve booted into the Recovery Assistant, go to the Disk Utility and wipe the boot partition (not the hard drive itself) on your hard drive.</li>
<li>Connect to your WiFi network or insert your network cable.</li>
<li>Go back to the main menu, and select the option to install ML. You will be asked to enter an Apple ID, enter the one relevant for this old Leopard device. This was my wife&#8217;s ID in my case. The Recovery Assistant will now download ML (can take some time, as it is 4.4 GB), and run through the installation procedure. This took around 3 hours for me. The machine will automatically reboot, and you will enter Mountain Lion heaven&#8230;</li>
</ol>
<p>Notes:</p>
<p>- Do note that iLife with not be available after installing ML from scratch. You can, however, install iLife from the original installation DVD&#8217;s you got with your Apple device. You will get the old iLife version, the new, updated ones need to be purchased in the App Store if you follow this guide.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/06/16/upgrade-os-x-leopard-to-mountain-lion-using-a-clean-install/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Much more efficient bubble sort in R using the Rcpp and inline packages</title>
		<link>http://www.numbertheory.nl/2013/05/14/much-more-efficient-bubble-sort-in-r-using-the-rcpp-and-inline-packages/</link>
		<comments>http://www.numbertheory.nl/2013/05/14/much-more-efficient-bubble-sort-in-r-using-the-rcpp-and-inline-packages/#comments</comments>
		<pubDate>Tue, 14 May 2013 19:02:02 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[R stuff]]></category>
		<category><![CDATA[bubble-sort]]></category>
		<category><![CDATA[inline]]></category>
		<category><![CDATA[R programming]]></category>
		<category><![CDATA[rcpp]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=567</guid>
		<description><![CDATA[Recently I wrote a blogpost showing the implementation of a simple bubble sort algorithm in pure R code. The downside of that implementation was that is was awfully slow. And by slow, I mean really slow, as in &#8220;a 100<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/05/14/much-more-efficient-bubble-sort-in-r-using-the-rcpp-and-inline-packages/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>Recently I wrote <a href="http://www.numbertheory.nl/2013/05/10/bubble-sort-implemented-in-pure-r/">a blogpost</a> showing the implementation of a simple bubble sort algorithm in pure R code. The downside of that implementation was that is was awfully slow. And by slow, I mean really slow, as in &#8220;a 100 element vector takes 7 seconds to sort&#8221;-slow. One of the major opportunities for a speed is to start using a compiled language. I chose to use C++ as this is really easy to integrate into R using the <code>Rcpp</code> package. In addition to the <code>Rcpp</code>  package I use the <code>inline</code> package which allows one to use C++ code and R code in a seamless fashion. The following code creates an R function <code>bubble_sort_cpp</code>:</p>
<p></p><pre class="crayon-plain-tag">require(inline)  ## for cxxfunction()                                                       
                                                                                            
src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in);                               
       double tmp = 0;                                                                      
       int no_swaps;                                                                        
       while(true) {                                                                        
           no_swaps = 0;                                                                    
           for (int i = 0; i &lt; vec.size()-1; ++i) {                                         
               if(vec[i] &gt; vec[i+1]) {                                                      
                   no_swaps++;                                                              
                   tmp = vec[i];                                                            
                   vec[i] = vec[i+1];                                                       
                   vec[i+1] = tmp;                                                          
               };                                                                           
           };                                                                               
           if(no_swaps == 0) break;                                                         
       };                                                                                   
       return(vec);'                                                                        
bubble_sort_cpp = cxxfunction(signature(vec_in = &quot;numeric&quot;), body=src, plugin=&quot;Rcpp&quot;)</pre><p></p>
<p>Quite amazing how easy it is to integrate R code and C++ code. <code>inline</code> compiles and links the C++ code on-the-fly, creating an R function that delivers the functionality. Of course the most important question is now how fast this is. I use the <code>microbenchmark</code> package to run the bubble sort I implemented in pure R (<a href="http://www.numbertheory.nl/2013/05/10/bubble-sort-implemented-in-pure-r/">here</a>), the bubble sort implemented in C++ (see above), and the standard R sorting algorithm:</p>
<p></p><pre class="crayon-plain-tag">library(microbenchmark) 
vector_size = 100                                                                           
print(microbenchmark(bubble_sort(sample(1:vector_size)),                                    
               bubble_sort_cpp(sample(1:vector_size)),                                      
               sort(sample(1:vector_size))))  
                                   expr       min         lq     median
     bubble_sort(sample(1:vector_size)) 67397.546 74358.9495 78143.0710
 bubble_sort_cpp(sample(1:vector_size))    44.895    55.9340    60.4930
            sort(sample(1:vector_size))    44.173    48.1315    62.3785
         uq        max neval
 81285.0215 105626.483   100 
    63.7715     74.643   100 
    67.2375    138.069   100</pre><p></p>
<p>These results speak for itself, the C++ version is more than 1300 times faster when looking at the median speed, even faster than the built-in <code>sort</code> function. These differences will only get more pronounced when the size of the vector grows.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/05/14/much-more-efficient-bubble-sort-in-r-using-the-rcpp-and-inline-packages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bubble sort implemented in pure R</title>
		<link>http://www.numbertheory.nl/2013/05/10/bubble-sort-implemented-in-pure-r/</link>
		<comments>http://www.numbertheory.nl/2013/05/10/bubble-sort-implemented-in-pure-r/#comments</comments>
		<pubDate>Fri, 10 May 2013 13:40:29 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[R stuff]]></category>
		<category><![CDATA[bubble-sort]]></category>
		<category><![CDATA[R programming]]></category>
		<category><![CDATA[recursion]]></category>
		<category><![CDATA[sorting]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=560</guid>
		<description><![CDATA[Please note that this is programming I purely did for the learning experience. The pure R bubble sort implemented in this post is veeeeery slow for two reasons: Interpreted code with lots of iteration is very slow. Bubble sort is<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/05/10/bubble-sort-implemented-in-pure-r/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>Please note that this is programming I purely did for the learning experience. The pure R <a href="http://en.wikipedia.org/wiki/Bubble_sort">bubble sort</a> implemented in this post is veeeeery slow for two reasons:</p>
<ol>
<li>Interpreted code with lots of iteration is very slow.</li>
<li>Bubble sort is one of the slowest sorting algorithms (<code>O(N^2)</code>)</li>
</ol>
<p>The bubble sort sorting algorithm works by iterating over the unsorted vector and comparing pairs of numbers. Let&#8217;s say the first point pair is <code>c(61, 3)</code>, here the numbers need to be swapped as the 3 should be earlier in the sorted vector. The following function returns <code>TRUE</code> if the numbers should be swapped, and returns <code>FALSE</code> otherwise:</p>
<p></p><pre class="crayon-plain-tag">larger = function(pair) {
   if(pair[1] &gt; pair[2]) return(TRUE) else return(FALSE)
}</pre><p></p>
<p>This function is used by the following function:</p>
<p></p><pre class="crayon-plain-tag">swap_if_larger = function(pair) {
    if(larger(pair)) {
        return(rev(pair)) 
    } else {
        return(pair)
    }
}</pre><p></p>
<p>which returns the swapped version of the pair if appropriate, or the original pair if the order is ok. For each point pair (element1-element2, element2-element3, etc) <code>swap_if_larger</code> is called:</p>
<p></p><pre class="crayon-plain-tag">swap_pass = function(vec) { 
    for(i in seq(1, length(vec)-1)) {
        vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)])
    }
    return(vec)
}</pre><p></p>
<p>One pass of this function performs a comparison on all pairs, swapping if necessary. To fully sort the vector, we need to perform multiple passes until no swaps are needed anymore. I chose to implement this using recursion:</p>
<p></p><pre class="crayon-plain-tag">bubble_sort = function(vec) {
    new_vec = swap_pass(vec)
    if(isTRUE(all.equal(vec, new_vec))) { 
        return(new_vec) 
    } else {
        return(bubble_sort(new_vec))
    }
}</pre><p></p>
<p>The function starts by perform a swapping pass over the vector. If the new vector is equal to the old vector, no swaps where needed, i.e. the vector is already sorted. The function than returns the vector. Alternatively, if the vectors are different, the vector is not yet fully sorted, and we need to perform more passes. This is accomplished by recursively calling <code>bubble_sort</code> again on the vector. An example of the function in action:</p>
<p></p><pre class="crayon-plain-tag">&gt; test_vec = round(runif(100, 0, 100))
&gt; bubble_sort(test_vec)
  [1]  1  1  6  6  9 10 10 10 13 14 14 15 19 19 20 21 23 24 24 24 26 26 26 26 27
 [26] 28 28 30 31 32 34 35 35 36 36 37 39 39 40 40 40 41 41 41 41 43 43 43 45 46
 [51] 47 51 56 56 57 57 57 58 58 59 61 61 62 63 64 65 68 68 69 70 71 71 72 73 74
 [76] 75 75 75 78 79 82 82 84 85 88 88 89 90 91 91 91 92 92 92 92 93 93 96 96 99
&gt;</pre><p></p>
<p>The full sorting process is nicely illustrated by the following animated gif (linked from wikipedia):</p>
<p><img src="http://upload.wikimedia.org/wikipedia/commons/3/37/Bubble_sort_animation.gif" alt="enter image description here" /></p>
<p>This implementation is horribly slow:</p>
<p></p><pre class="crayon-plain-tag">&gt; system.time(bubble_sort(test_vec))
   user  system elapsed
  0.076   0.000   0.077
&gt; system.time(sort(test_vec))
   user  system elapsed
  0.001   0.000   0.001</pre><p></p>
<p>Probably implementing the relatively slow bubble sort in a compiled language pose a dramatic increase in speed. Maybe a nice first testcase for <code>Rcpp</code>&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/05/10/bubble-sort-implemented-in-pure-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ctags support for IDL: regular expression definitons</title>
		<link>http://www.numbertheory.nl/2013/04/24/ctags-support-for-idl-regular-expression-definitons/</link>
		<comments>http://www.numbertheory.nl/2013/04/24/ctags-support-for-idl-regular-expression-definitons/#comments</comments>
		<pubDate>Wed, 24 Apr 2013 09:24:48 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[IDL]]></category>
		<category><![CDATA[ctags]]></category>
		<category><![CDATA[IDL programming]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=554</guid>
		<description><![CDATA[One of the reasons I switched to using Vim as a text editor is the excellent supports for Ctags. In a nutshell, ctags allows you to put your cursor on a function name, press C-p, and jump to the file<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/04/24/ctags-support-for-idl-regular-expression-definitons/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>One of the reasons I switched to using Vim as a text editor is the excellent supports for Ctags. In a nutshell, ctags allows you to put your cursor on a function name, press C-p, and jump to the file where that function is defined. Ctags supports a great number of programming languages, but unfortunately, IDL is not one of them. Luckily it is straightforward to add support for new languages. Simply add this:</p>
<p></p><pre class="crayon-plain-tag">--langdef=IDL
--langmap=IDL:.pro
--regex-IDL=/^pro[ \t]+([a-zA-Z0-9_:]+)/\1/p,procedure/i
--regex-IDL=/^function[ \t]+([a-zA-Z0-9_:]+)/\1/f,function/i</pre><p></p>
<p>to your <code>~/.ctags</code> file to be able to use Ctags with IDL.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/04/24/ctags-support-for-idl-regular-expression-definitons/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Indexing IDL matrices with vectors: some unexpected behavior for out of range values</title>
		<link>http://www.numbertheory.nl/2013/04/17/indexing-idl-matrices-with-vectors-some-unexpected-behavior-for-out-of-range-values/</link>
		<comments>http://www.numbertheory.nl/2013/04/17/indexing-idl-matrices-with-vectors-some-unexpected-behavior-for-out-of-range-values/#comments</comments>
		<pubDate>Wed, 17 Apr 2013 15:17:17 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[IDL]]></category>
		<category><![CDATA[IDL programming]]></category>
		<category><![CDATA[idl-quirk]]></category>
		<category><![CDATA[IDL7.0]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=550</guid>
		<description><![CDATA[IDL is not the most used language around, many of you might not even have heard of it. Mind you that by IDL I mean the Interactive Data Language, and not the Interface Description Language, which many more people know.<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/04/17/indexing-idl-matrices-with-vectors-some-unexpected-behavior-for-out-of-range-values/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>IDL is not the most used language around, many of you might not even have heard of it. Mind you that by IDL I mean the <a href="http://en.wikipedia.org/wiki/IDL_%28programming_language%29">Interactive Data Language</a>, and not the Interface Description Language, which many more people know. IDL is stil used a lot in some scientific applications, for example in astronomy and remote sensing. IDL has some quirks which might catch you off guard. I just encountered one of those, which I would like to share. Note that this is valid for version 7 of IDL, which is not the latest one.</p>
<p>Assume we have the following matrix:</p>
<p></p><pre class="crayon-plain-tag">IDL&gt; spam = DINDGEN(4,6)
IDL&gt; print, spam
       0.0000000       1.0000000       2.0000000       3.0000000
       4.0000000       5.0000000       6.0000000       7.0000000
       8.0000000       9.0000000       10.000000       11.000000
       12.000000       13.000000       14.000000       15.000000
       16.000000       17.000000       18.000000       19.000000
       20.000000       21.000000       22.000000       23.000000</pre><p></p>
<p>We can index this matrix in the following manner, observe that IDL indexing starts at zero:</p>
<p></p><pre class="crayon-plain-tag">IDL&gt; print, spam[1,1]
       5.0000000</pre><p></p>
<p>Providing an invalid index nicely leads to an exception:</p>
<p></p><pre class="crayon-plain-tag">IDL&gt; print, spam[1000,1000]
% Attempt to subscript SPAM with &amp;lt;INT      (    1000)&gt; is out of range.
% Execution halted at: $MAIN$</pre><p></p>
<p>IDL is also vectorized, so we can pass vectors of indices to extract multiple values in one go:</p>
<p></p><pre class="crayon-plain-tag">IDL&gt; print, spam[[1,1,3,2],[1,1,4,1]]
       5.0000000       5.0000000       19.000000       6.0000000</pre><p></p>
<p>But now comes the problem. Let&#8217;s pass some out of range indices as vectors:</p>
<p></p><pre class="crayon-plain-tag">IDL&gt; print, spam[[1000,5000,7000], [1000,2000,4000]]
       23.000000       23.000000       23.000000</pre><p></p>
<p>In stead of throwing an exception, IDL happily returns the last valid value in that dimension, or in this case a pair of dimensions, i.e. 23. So, beware, when passing vectors of indices to an IDL array there is no out of range checking: you are on your own <img src='http://www.numbertheory.nl/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .</p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/04/17/indexing-idl-matrices-with-vectors-some-unexpected-behavior-for-out-of-range-values/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parsing complex text files using regular expressions and vectorization</title>
		<link>http://www.numbertheory.nl/2013/03/24/parsing-complex-text-files-using-regular-expressions-and-vectorization/</link>
		<comments>http://www.numbertheory.nl/2013/03/24/parsing-complex-text-files-using-regular-expressions-and-vectorization/#comments</comments>
		<pubDate>Sun, 24 Mar 2013 08:27:18 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[R stuff]]></category>
		<category><![CDATA[R programming]]></category>
		<category><![CDATA[regular-expression]]></category>
		<category><![CDATA[text-processing]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=529</guid>
		<description><![CDATA[When text data is in a nice CSV format, read.csv is enough to parse it into a useable format. But if this is not the case, getting the data into a useable format is not so straightforward. In this post<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/03/24/parsing-complex-text-files-using-regular-expressions-and-vectorization/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>When text data is in a nice CSV format, <code>read.csv</code> is enough to parse it into a useable format. But if this is not the case, getting the data into a useable format is not so straightforward. In this post I particularly illustrate the use of regular expressions for complex and flexible text processing, and the power of vectorization in R. Vectorization means that we operate on vectors as a whole, not operate on individual elements of a vector.</p>
<p>Take for example a snippet of this data which I downloaded <a href="http://stackoverflow.com/reputation">from StackOverflow</a>:</p>
<p></p><pre class="crayon-plain-tag">2  15466134 (10)
 2  15466134 (10)
 1  15462529 (15)
 2  13265177 (10)
 2  15475139 (10)
 2  15486973 (10)
-- 2013-03-18 rep +65   = 15552     
 2  14376993 (10)
 2  14376993 (10)
 2  14376993 (10)
 1  15493353 (15)
 2  12598625 (10)
 2  14376993 (10)
-- 2013-03-19 rep +65   = 15617     
 2  15520314 (10)
 2  15520314 (10)
-- 2013-03-20 rep +20   = 15637     
 1  15541210 (15)
 2  15541210 (10)
 2  15541210 (10)
 2  15541210 (10)</pre><p></p>
<p>The entire data file can be downloaded <a href="http://intamap.geo.uu.nl/~paul/transport/rep.dat">here</a>.</p>
<p>In this post I&#8217;ll be stepping through the R code needed to get this text data into a useable format. First, we want to read the data into a character vector:</p>
<p></p><pre class="crayon-plain-tag">all_data = readLines(&quot;rep.dat&quot;)
head(all_data)
[1] &quot;total votes: 2325&quot; &quot; 2   8150378 (10)&quot; &quot; 2   8167111 (10)&quot;
[4] &quot; 2   8167111 (10)&quot; &quot; 2   8167111 (10)&quot; &quot; 2   8167461 (10)&quot;</pre><p></p>
<p>where each element of the vector is a line in the the text file. Already we see that the first line is some header information which we want to skip:</p>
<p></p><pre class="crayon-plain-tag">all_data = all_data[-1]</pre><p></p>
<p>note the use of negative indexing to remove an element. Next we want to find all the elements in the vector that relate to the date for which the data is representative, we do that by using a regular expression which looks for lines that start with <code>-</code>:</p>
<p></p><pre class="crayon-plain-tag">rep_date_entries = grep(&quot;^-&quot;, all_data)</pre><p></p>
<p>and find the amount of actions, upvotes or downvotes etc, that have taken place on each day, i.e. the index of a certain day minus the index of the day before that:</p>
<p></p><pre class="crayon-plain-tag">actions_per_day = c(rep_date_entries[1], diff(rep_date_entries)) - 1</pre><p></p>
<p>note that we add <code>rep_date_entries[1]</code> because <code>diff</code> cuts off the first element. Now that we know which elements relate to the date, we can read all other lines into a nice <code>data.frame</code>:</p>
<p></p><pre class="crayon-plain-tag">dat = read.table(text = all_data[-rep_date_entries])  
names(dat) = c(&quot;action_id&quot;, &quot;question_id&quot;, &quot;rep_change&quot;)</pre><p></p>
<p>The reputation column has a somewhat strange format (<code>(10)</code>), we need to get rid of the brackets. A nice way of doing that is using a regular expression, and the <code>str_extract</code> function from the <code>stringr</code> package:</p>
<p></p><pre class="crayon-plain-tag">require(stringr)
dat$rep_change = with(dat, as.numeric(str_extract(rep_change, &quot;-*[0-9]+&quot;)))</pre><p></p>
<p>The regular expression <code>[0-9]+</code> matches one or more numbers, and <code>str_extract</code> gets the number out of the string. Now we have the data, we need to add a column which says for each row to which date it belongs. We know which lines in the data belong to a date (<code>rep_date_entries</code>) and we know how much data entries there are per day (<code>actions_per_day</code>). We can now simply repeat each element in <code>rep_date_entries</code> as many times as there are actions:</p>
<p></p><pre class="crayon-plain-tag">dat$rep_date = rep(all_data[rep_date_entries], times = actions_per_day)
head(dat)
  action_id question_id rep_change                             rep_date
1         2     8150378         10 -- 2011-11-17 rep +95   = 96
2         2     8167111         10 -- 2011-11-17 rep +95   = 96
3         2     8167111         10 -- 2011-11-17 rep +95   = 96
4         2     8167111         10 -- 2011-11-17 rep +95   = 96</pre><p></p>
<p>You can see that the date is not yet in a nice format, we need to get rid of all the text, except the date itself. Again, we can use a regular expression, combined with <code>str_extract</code> for this:</p>
<p></p><pre class="crayon-plain-tag">dat$rep_date = str_extract(dat$rep_date, &quot;[0-9]{4}-[0-9]{2}-[0-9]{2}&quot;)</pre><p></p>
<p>The regular expression <code>"[0-9]{4}-[0-9]{2}-[0-9]{2}"</code> matches any occurence of 4 numbers-2 numbers-2 numbers. Finally, we transform the date from a string to a real date object using <code>strptime</code>:</p>
<p></p><pre class="crayon-plain-tag">dat$rep_date = strptime(dat$rep_date, &quot;%Y-%m-%d&quot;)</pre><p></p>
<p>The end result is the following <code>data.frame</code>:</p>
<p></p><pre class="crayon-plain-tag">head(dat)
  action_id question_id rep_change   rep_date
1         2     8150378         10 2011-11-17
2         2     8167111         10 2011-11-17
3         2     8167111         10 2011-11-17
4         2     8167111         10 2011-11-17
5         2     8167461         10 2011-11-17
6         1     8167461         15 2011-11-17
summary(dat)
   action_id       question_id         rep_change
 Min.   : 1.000   Min.   :  489821   Min.   :  0.000
 1st Qu.: 2.000   1st Qu.: 9521651   1st Qu.: 10.000
 Median : 2.000   Median :11738823   Median : 10.000
 Mean   : 2.336   Mean   :11475310   Mean   :  9.873
 3rd Qu.: 2.000   3rd Qu.:13175326   3rd Qu.: 10.000
 Max.   :16.000   Max.   :15541210   Max.   :100.000
    rep_date
 Min.   :2011-11-17 00:00:00
 1st Qu.:2012-04-08 00:00:00
 Median :2012-08-30 00:00:00
 Mean   :2012-08-02 19:55:59
 3rd Qu.:2012-11-16 00:00:00
 Max.   :2013-03-23 00:00:00</pre><p></p>
<p>All this code together leads to the following function:</p>
<p></p><pre class="crayon-plain-tag">parse_so_rep_page = function(rep_file) {
   require(stringr)
   all_data = readLines(rep_file)
   all_data = all_data[-1]
   
   rep_date_entries = grep(&quot;^-&quot;, all_data)
   actions_per_day = c(rep_date_entries[1], diff(rep_date_entries)) - 1 
   
   dat = read.table(text = all_data[-rep_date_entries])
   names(dat) = c(&quot;action_id&quot;, &quot;question_id&quot;, &quot;rep_change&quot;)
   dat$rep_change = with(dat, as.numeric(str_extract(rep_change, &quot;-*[0-9]+&quot;)))
   
   dat$rep_date = rep(all_data[rep_date_entries], times = actions_per_day)
   dat$rep_date = str_extract(dat$rep_date, &quot;[0-9]{4}-[0-9]{2}-[0-9]{2}&quot;)
   dat$rep_date = strptime(dat$rep_date, &quot;%Y-%m-%d&quot;)
   return(dat)
}
res = parse_so_rep_page(&quot;data/rep.dat&quot;)</pre><p></p>
<p>I think this nicely illustrates the power of both vectorization, very short and to-the-point for-loop-less syntax, and regular expressions in editing strings.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/03/24/parsing-complex-text-files-using-regular-expressions-and-vectorization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slate: an XMonad like windowmanager for Mac OS</title>
		<link>http://www.numbertheory.nl/2013/03/16/slate-an-xmonad-like-windowmanager-for-mac-os/</link>
		<comments>http://www.numbertheory.nl/2013/03/16/slate-an-xmonad-like-windowmanager-for-mac-os/#comments</comments>
		<pubDate>Sat, 16 Mar 2013 09:00:34 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[MacBook]]></category>
		<category><![CDATA[mac-os-x]]></category>
		<category><![CDATA[slate]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=519</guid>
		<description><![CDATA[For work I recently switched to using Mac OS (MacBook Pro 15&#8221; retina), until then I had been using Linux. The switch was rather painless, as a lot of the unix goodness (terminal!) is also present on the Mac. One<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/03/16/slate-an-xmonad-like-windowmanager-for-mac-os/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>For work I recently switched to using Mac OS (MacBook Pro 15&#8221; retina), until then I had been using Linux. The switch was rather painless, as a lot of the unix goodness (terminal!) is also present on the Mac. One thing that I was missing from my Linux days is XMonad. XMonad is a tiled window manager, which is very lightweight and heavily depends on shortcuts. But then a colleague suggested I take a look at <a href="https://github.com/jigish/slate">Slate</a>. Slate has a lot of the functionality and configurability of XMonad, and is a very nice addition to my already shortcut centered way of working on my MacBook.</p>
<p>Slate allows you to:</p>
<ul>
<li>Attach keystrokes to how you want to manage your windows: e.g. resizing, moving around the screen, shifting focus between apps.</li>
<li>Create app layouts, e.g. Google Chrome full screen, Mail Client on my 2nd screen maximized to the left half of the screen, Terminal maximized to the right half of the screen, and being able to quickly switch between them. It also allows you to create layouts for 1 and 2 monitor setups, and switches between them automatically.</li>
<li>Windows Hints, press a button and all the apps are marked by a letter. Pushing that letter shifts focus to that app.</li>
<li>An alternative app switcher.</li>
</ul>
<p>Configuring Slate is done using a <code>.slate</code> file in your homedirectory, and can be a bit daunting. But if you are used to working with <code>.bash_profile</code>&#8216;s and such, you&#8217;ll feel right at home. I&#8217;ve only just starting to work with Slate, but you can have a look at my config file at the bottom of this post. A nice introductory blog post for Slate <a href="http://thume.ca/howto/2012/11/19/using-slate/">has been written</a> by Tristan Hume.</p>
<p>One issue for me right now is that Slate does not work well with Mission Control (multiple workspaces). The features I use in the config file below work fine, but for example the layout&#8217;s aren&#8217;t able to use multiple workspaces. This is a known issue, and progress in this direction is hampered by the lack of API support from apple.</p>
<p></p><pre class="crayon-plain-tag"># Some config options
# Options relevant to Window hints
config windowHintsShowIcons true
config windowHintsIgnoreHiddenWindows false
config windowHintsSpread true

# Abstract positions
alias full move screenOriginX;screenOriginY screenSizeX;screenSizeY
alias lefthalf move screenOriginX;screenOriginY screenSizeX/2;screenSizeY
alias righthalf move screenOriginX+screenSizeX/2;screenOriginY screenSizeX/2;screenSizeY
alias topleft corner top-left resize:screenSizeX/2;screenSizeY/2
alias topright corner top-right resize:screenSizeX/2;screenSizeY/2
alias bottomleft corner bottom-left resize:screenSizeX/2;screenSizeY/2
alias bottomright corner bottom-right resize:screenSizeX/2;screenSizeY/2

# Bind window hinting to cmd+e, using the given letters
bind e:cmd hint ASDFGHJKLQWERTYUIOPCVBN # use whatever keys you want

# Press cmd+g to get a grid, drag on that 
# grid to determine the size of an app
bind g:cmd grid padding:5 0:6,2 1:8,2

# Use the Slate task switcher (beta)
bind tab:cmd switch

# Send a program to a particular screen
bind 1:alt,ctrl throw 0 resize
bind 2:alt,ctrl throw 1 resize

# Use the keys below to put the selected window
bind right:ctrl;alt  ${righthalf}   #...at the right half of the screen
bind left:ctrl;alt   ${lefthalf}    #...on the left half of the screen
bind up:ctrl;alt     ${full}        #...fullscreen

# Focus Bindings
# Shift focus to the app to the
bind right:cmd    focus right
bind left:cmd     focus left
bind up:cmd       focus up
bind down:cmd     focus down
bind up:cmd;alt   focus behind
bind down:cmd;alt focus behind</pre><p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/03/16/slate-an-xmonad-like-windowmanager-for-mac-os/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automatic spatial interpolation with R: the automap package</title>
		<link>http://www.numbertheory.nl/2013/02/17/automatic-spatial-interpolation-with-r-the-automap-package/</link>
		<comments>http://www.numbertheory.nl/2013/02/17/automatic-spatial-interpolation-with-r-the-automap-package/#comments</comments>
		<pubDate>Sun, 17 Feb 2013 11:26:12 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[R stuff]]></category>
		<category><![CDATA[R programming]]></category>
		<category><![CDATA[spatial-interpolation]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=491</guid>
		<description><![CDATA[In case of continuously collected data, e.g. observations from a monitoring network, spatial interpolation of this data cannot be done manually. Instead, the interpolation should be done automatically. To achieve this goal, I developed the automap package. automap builds on<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/02/17/automatic-spatial-interpolation-with-r-the-automap-package/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>In case of continuously collected data, e.g. observations from a monitoring network, spatial interpolation of this data cannot be done manually. Instead, the interpolation should be done automatically. To achieve this goal, I developed the <a href="http://cran.r-project.org/web/packages/automap/index.html"><code>automap</code></a> package. <code>automap</code> builds on top of the excellent <code>gstat</code> package, and provides automatic spatial interpolation, more specifically, automatic <a href="http://en.wikipedia.org/wiki/Kriging">kriging</a>. Kriging in its more simple form (Ordinary Kriging, Universal Kriging, aka Kriging with External Drift) is actually nothing more than linear regression with spatially correlated residuals.</p>
<p><code>automap</code> provides the following set of functions (for details I refer to the <a href="http://cran.r-project.org/web/packages/automap/automap.pdf">online manual</a>):</p>
<ul>
<li><code>autofitVariogram</code>, automatically fits the variogram model to the data.</li>
<li><code>autoKrige</code>, automatically fits the variogram model using <code>autofitVariogram</code>, and creates an interpolated map.</li>
<li><code>autoKrige.cv</code>, automatically fits the variogram model using <code>autofitVariogram</code>, and performs cross-validation. Uses <code>krige.cv</code> under the hood.</li>
<li><code>compare.cv</code>, allows comparison of the output of <code>autoKrige.cv</code> and <code>krige.cv</code>. This can be used to evaluate the performance of different interpolation algorithms. <code>compare.cv</code> allows comparison using both summary statistics and spatial plots.</li>
</ul>
<p>In general, the interface of <code>automap</code> mimics that of <code>gstat</code>. The following code snippets show some examples of creating interpolated maps using <code>automap</code>:</p>
<p></p><pre class="crayon-plain-tag">library(automap)
loadMeuse()
# Ordinary kriging
kriging_result = autoKrige(zinc~1, meuse, meuse.grid)
plot(kriging_result)
# Universal kriging
kriging_result = autoKrige(zinc~soil+ffreq+dist, meuse, meuse.grid)
plot(kriging_result)</pre><p></p>
<p>You can get <code>automap</code> from either CRAN:</p>
<p></p><pre class="crayon-plain-tag">install.packages(&quot;automap&quot;)</pre><p></p>
<p>or <a href="https://bitbucket.org/paulhiemstra/automap">my bitbucket account</a>.</p>
<p></p><pre class="crayon-plain-tag">hg clone https://bitbucket.org/paulhiemstra/automap</pre><p></p>
<p>PS: <code>automap</code> was the first package I wrote, at the beginning of my PhD, so it is not the most beautiful code I ever wrote <img src='http://www.numbertheory.nl/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> .</p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/02/17/automatic-spatial-interpolation-with-r-the-automap-package/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Implementing a hash table in Fortran 90: Part 2</title>
		<link>http://www.numbertheory.nl/2013/01/24/implementing-a-hash-table-in-fortran-90-part-2/</link>
		<comments>http://www.numbertheory.nl/2013/01/24/implementing-a-hash-table-in-fortran-90-part-2/#comments</comments>
		<pubDate>Thu, 24 Jan 2013 21:51:21 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[Fortran]]></category>
		<category><![CDATA[fortran]]></category>
		<category><![CDATA[hashtable]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=482</guid>
		<description><![CDATA[In my last post I proposed a simple implementation for a hash table in Fortran 90 using a module. I extended the hashtable to make it more usable in a realistic setting. Do note that in some aspects, this implementation<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/01/24/implementing-a-hash-table-in-fortran-90-part-2/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>In <a href="http://www.numbertheory.nl/2013/01/23/implementing-a-simple-hash-table-in-fortran-90/">my last post</a> I proposed a simple implementation for a hash table in Fortran 90 using a module. I extended the hashtable to make it more usable in a realistic setting. Do note that in some aspects, this implementation of a hash table is not very efficient. This is mainly in the ability to quickly add and retrieve elements form the hash table. My implementation uses a linear search to find the key, this could be done much more efficiently using e.g. a binary search.</p>
<p>I made the following changes:</p>
<ul>
<li>There is now just one subroutine to put stuff into the hash table: <code>hash_set</code>. When <code>hash_set</code> is called, the hash table tries to find the key, and if it is not found, the key-value pair is pushed onto the hash table. <code>hash_push</code> is no longer publicly available.</li>
<li>Standard behavior now is that the hash table has a size of 50 items, and the table no longer has to be initialized at a certain size. In addition, when the 50 numbers are full, <code>hash_reallocate</code> is called to extend the hash table by another 50 items. This makes the hash table much more flexible.</li>
<li><code>hash_print</code> now only prints the key-value pairs that are actually used.</li>
</ul>
<p>The following program shows the new hash table module in action:</p>
<p></p><pre class="crayon-plain-tag">PROGRAM hash_test
  use hash
  REAL :: res = 0.0

  CALL hash_init

  CALL hash_set(&quot;one&quot;, 1.0)
  CALL hash_set(&quot;two&quot;, 2.0)
  CALL hash_set(&quot;three&quot;, 3.0)

  CALL hash_print
  CALL hash_get(&quot;one&quot;, res)
  PRINT*, res
  CALL hash_get(&quot;two&quot;, res)
  PRINT*, res

  CALL hash_set(&quot;one&quot;, 60.0)

  CALL hash_print
  CALL hash_get(&quot;one&quot;, res)
  PRINT*, res
END PROGRAM hash_test</pre><p></p>
<p>A nice addition would be add an index to the hash table, allowing one to have several hash tables inside the one hash table module. Any call to a subroutine would then also require specifying which hash table one needs to change.</p>
<p>The source code for the module is given here:</p>
<p></p><pre class="crayon-plain-tag">MODULE hash
  IMPLICIT NONE
  PRIVATE

  PUBLIC :: hash_init
  PUBLIC :: hash_get
  PUBLIC :: hash_set
  PUBLIC :: hash_print

  INTEGER,               PARAMETER   :: CharLength = 128
  INTEGER,               PARAMETER   :: start_hash_size = 50 
  INTEGER                            :: current_size, new_size
  CHARACTER(CharLength), ALLOCATABLE :: keys(:)
  REAL,                  ALLOCATABLE :: values(:)
  LOGICAL,               ALLOCATABLE :: used(:)
  INTEGER                            :: hash_index

CONTAINS

  SUBROUTINE hash_init
    INTEGER              :: status
    ALLOCATE(keys(start_hash_size), stat=status)
    ALLOCATE(values(start_hash_size), stat=status)
    ALLOCATE(used(start_hash_size), stat=status)
    hash_index = 0 
    keys(:) = &quot;&quot;
    values = 0.0
    used(:) = .FALSE.
    current_size = start_hash_size
  END SUBROUTINE hash_init

  SUBROUTINE hash_push(key, value)
    CHARACTER(*), INTENT(IN)     :: key
    REAL        , INTENT(IN)     :: value
    hash_index = hash_index + 1
    IF(hash_index &gt; Size(keys, 1)) CALL hash_reallocate
    keys(hash_index) = key
    values(hash_index) = value
    used(hash_index) = .TRUE.
  END SUBROUTINE hash_push

  SUBROUTINE hash_set(key, value)
    CHARACTER(*), INTENT(IN)     :: key
    REAL        , INTENT(IN)     :: value
    INTEGER                      :: local_index
    LOGICAL                      :: found
    found = .FALSE. 
    DO local_index = 1,Size(keys,1)
      IF(TRIM(keys(local_index)) == TRIM(key)) THEN 
        values(local_index) = value
        found = .TRUE.
      ENDIF
    ENDDO 
    IF(.NOT.found) THEN
      CALL hash_push(key, value)
    ENDIF 
  END SUBROUTINE hash_set

  SUBROUTINE hash_get(key, value)
    CHARACTER(*), INTENT(IN)     :: key
    REAL        , INTENT(OUT)    :: value
    INTEGER                      :: local_index
    LOGICAL                      :: found
    found = .FALSE. 
    DO local_index = 1,Size(keys,1)
      IF(TRIM(keys(local_index)) == TRIM(key)) THEN 
        value = values(local_index)
        found = .TRUE.
      ENDIF
    ENDDO 
    IF(.NOT.found) CALL print_error(&quot;Unknown key&quot;)
  END SUBROUTINE hash_get

  SUBROUTINE hash_print
    INTEGER  :: local_index 
    PRINT*, &quot;Contents of the hashtable:&quot;
    DO local_index = 1,Size(keys,1)
      IF(used(local_index)) PRINT*, TRIM(keys(local_index)), &quot; = &quot;, values(local_index)
    ENDDO
  END SUBROUTINE hash_print

  SUBROUTINE hash_reallocate
    CHARACTER(CharLength), ALLOCATABLE :: temp_keys(:)
    REAL                 , ALLOCATABLE :: temp_values(:)   
    LOGICAL              , ALLOCATABLE :: temp_used(:)
    INTEGER                            :: status
    new_size = current_size + start_hash_size
    ALLOCATE(temp_keys(current_size))
    ALLOCATE(temp_values(current_size))
    ALLOCATE(temp_used(current_size))
    temp_keys(:) = keys
    temp_values(:) = values(:)
    temp_used(:) = used(:)
    DEALLOCATE(keys)
    DEALLOCATE(values)
    DEALLOCATE(used)
    ALLOCATE(keys(new_size))
    keys(:) = &quot;&quot;
    ALLOCATE(values(new_size))
    values(:) = 0.0
    ALLOCATE(used(new_size))
    used(:) = .FALSE.
    keys(1:current_size) = temp_keys(:)
    values(1:current_size) = temp_values(:)
    used(1:current_size) = temp_used(:)
  END SUBROUTINE hash_reallocate

  SUBROUTINE print_error(text)
    CHARACTER(*) :: text
    PRINT*, text
    STOP
  END SUBROUTINE print_error
END MODULE hash</pre><p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/01/24/implementing-a-hash-table-in-fortran-90-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Implementing a simple hash table in Fortran 90</title>
		<link>http://www.numbertheory.nl/2013/01/23/implementing-a-simple-hash-table-in-fortran-90/</link>
		<comments>http://www.numbertheory.nl/2013/01/23/implementing-a-simple-hash-table-in-fortran-90/#comments</comments>
		<pubDate>Wed, 23 Jan 2013 14:59:54 +0000</pubDate>
		<dc:creator>Paul Hiemstra</dc:creator>
				<category><![CDATA[Fortran]]></category>
		<category><![CDATA[fortran]]></category>
		<category><![CDATA[hashtable]]></category>
		<category><![CDATA[module]]></category>

		<guid isPermaLink="false">http://www.numbertheory.nl/?p=473</guid>
		<description><![CDATA[Implementing a hash table, or dictionary as it is called in Python, in Fortran 90 turned out to be non-trivial. For starters, no standard data type was available (afaik) in Fortran 90. I decided to implement one myself, custom for<span class="ellipsis">&#8230;</span> <a href="http://www.numbertheory.nl/2013/01/23/implementing-a-simple-hash-table-in-fortran-90/"><div class="see-more">See more &#8250;</div><!-- end of .see-more --></a>]]></description>
				<content:encoded><![CDATA[<p>Implementing a hash table, or dictionary as it is called in Python, in Fortran 90 turned out to be non-trivial. For starters, no standard data type was available (<a href="http://en.wiktionary.org/wiki/AFAIK">afaik</a>) in Fortran 90. I decided to implement one myself, custom for my situation. What I needed was a mapping from a model name to a <code>REAL</code> value, storing key-value pairs. The hash table would enable me to take the model name, and retrieve the associated <code>REAL</code> value. The following module implements the hash table. It has a number of subroutines:</p>
<ul>
<li><code>hash_init(no_items)</code>, initializes the hash table with the correct number of key-value pairs.</li>
<li><code>hash_push(key, value)</code>, pushes the key-value pair to the next available place in the hash table, i.e. first index 1, next index 2, etc.</li>
<li><code>hash_get(key, value)</code>, retrieves the value for a given key. Errors are raised when the hash table is not yet fully filled, or when the key has not been found.</li>
<li><code>hash_set(key, value)</code>, allows the user to change the value of a given key-value pair. Errors are raised when the hash table is not yet fully filled, or when the key has not been found.</li>
<li><code>hash_print()</code>, prints the current contents of the hash table to the screen.</li>
</ul>
<p>The following program illustrates the use of the hash module (tested using <code>ifort</code> and <code>gfortran</code>):</p>
<p></p><pre class="crayon-plain-tag">PROGRAM hash_test
  use hash
  REAL :: res = 0.0

  CALL hash_init(3)

  CALL hash_push(&quot;one&quot;, 1.0)
  CALL hash_push(&quot;two&quot;, 2.0)
  CALL hash_push(&quot;three&quot;, 3.0)

  CALL hash_print
  CALL hash_get(&quot;one&quot;, res)
  PRINT*, res
  CALL hash_get(&quot;two&quot;, res)
  PRINT*, res

  CALL hash_set(&quot;one&quot;, 60.0)

  CALL hash_print
  CALL hash_get(&quot;one&quot;, res)
  PRINT*, res
END PROGRAM hash_test</pre><p></p>
<p>The code of the module <code>hash</code> is given here:</p>
<p></p><pre class="crayon-plain-tag">MODULE hash
  IMPLICIT NONE
  PRIVATE

  PUBLIC :: hash_init
  PUBLIC :: hash_push
  PUBLIC :: hash_get
  PUBLIC :: hash_set
  PUBLIC :: hash_print

  INTEGER,               PARAMETER   :: CharLength = 128
  CHARACTER(CharLength), ALLOCATABLE :: keys(:)
  REAL,                  ALLOCATABLE :: values(:)
  INTEGER                            :: hash_index

CONTAINS

  SUBROUTINE hash_init(no_items)
    INTEGER, INTENT(in)  :: no_items
    INTEGER              :: status
    ALLOCATE(keys(no_items), stat=status)
    ALLOCATE(values(no_items), stat=status)
    hash_index = 0 
  END SUBROUTINE hash_init

  SUBROUTINE hash_push(key, value)
    CHARACTER(*), INTENT(IN)     :: key
    REAL        , INTENT(IN)     :: value
    hash_index = hash_index + 1
    IF(hash_index &gt; Size(keys, 1)) CALL print_error(&quot;Error: Hash table is already full&quot;)
    keys(hash_index) = key
    values(hash_index) = value
  END SUBROUTINE hash_push

  SUBROUTINE hash_set(key, value)
    CHARACTER(*), INTENT(IN)     :: key
    REAL        , INTENT(IN)     :: value
    INTEGER                      :: local_index
    LOGICAL                      :: found
    found = .FALSE. 
    IF(hash_index &lt; Size(keys, 1)) CALL print_error(&quot;Error: the hashtable is not yet full&quot;) 
    DO local_index = 1,Size(keys,1)
      IF(TRIM(keys(local_index)) == TRIM(key)) THEN 
        values(local_index) = value
        found = .TRUE.
      ENDIF
    ENDDO 
    IF(.NOT.found) CALL print_error(&quot;Unknown key&quot;)
  END SUBROUTINE hash_set

  SUBROUTINE hash_get(key, value)
    CHARACTER(*), INTENT(IN)     :: key
    REAL        , INTENT(OUT)    :: value
    INTEGER                      :: local_index
    LOGICAL                      :: found
    found = .FALSE. 
    IF(hash_index &lt; Size(keys, 1)) CALL print_error(&quot;Error: the hashtable is not yet full&quot;) 
    DO local_index = 1,Size(keys,1)
      IF(TRIM(keys(local_index)) == TRIM(key)) THEN 
        value = values(local_index)
        found = .TRUE.
      ENDIF
    ENDDO 
    IF(.NOT.found) CALL print_error(&quot;Unknown key&quot;)
  END SUBROUTINE hash_get

  SUBROUTINE hash_print
    INTEGER  :: local_index 
    PRINT*, &quot;Contents of the hashtable:&quot;
    DO local_index = 1,Size(keys,1)
      PRINT*, TRIM(keys(local_index)), &quot; = &quot;, values(local_index)
    ENDDO
  END SUBROUTINE hash_print

  SUBROUTINE print_error(text)
    CHARACTER(*) :: text
    PRINT*, text
    STOP
  END SUBROUTINE print_error
END MODULE hash</pre><p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.numbertheory.nl/2013/01/23/implementing-a-simple-hash-table-in-fortran-90/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
