Cleaning sentences by recursively merging words using R

A question on StackOverflow really sparked my attention. The aim was to clean up a dataset of inappropriately spaced words. For example:

My approach was to create what I call a wordpair object. The word pair object for the example sentence looks like:

Then we iterate over the word pairs, and check if they are correct words using the aspell function in R, and recursively keep merging words until no new correct words can be found. The code I created to create the wordpair object, transform a wordpair back to a list of words, and some additional functions can be found at the end of this post.

Applied to the example dataset this would result in:

Tagged with: ,
Posted in R stuff
1 Comment » for Cleaning sentences by recursively merging words using R

Leave a Reply