I’ve worked with various alternate file handlers in python before and wanted to explore the options in R. I was pleasantly surprised to find handlers prebuilt for tasks like compressing data. In addition, a pipe function is available to allow you to use less common commands on your file, like gpg for encryption.
I put together a quick video demo of how to use these functions, and it’s available on youtube:
Note that I’m using the tm package, which is the traditional way to work with a document collection in R. There are new ways like tidytext that are gaining popularity. I may do a follow up talk on that.
The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and then show the results of both strategies.
The numeric output will vary, but look something like:
Clustering is a useful technique for exploring your data. It groups records into clusters based on similar features. It’s also a key technique of unsupervised learning. The following is a simple example in R where I plotted the clusters and centroids.
The example uses the mtcars dataset built into R, which contains auto data extracted from Motor Trend Magazine in 1973-1974.
Clustering is done with the kmeans() function. Note that the graph is 2-dimensional, and I cluster by 2 features, but you could cluster by more features and project down to a 2-dimensional plane.
I’ve been working through the following book on Bayesian methods with an emphasis on the pymc library:
However, pymc installation on OS X can be a bit of a pain. The issues comes down to fortran… I know. The version of gfortran in newer gcc implementations doesn’t work well with the pymc build, you need gfortran 4.2, as provided orignally by apple. Homebrew has a package for this.
I dealt with this before, but had problems again after upgrading to Sierra. So this time, I thought I’d document the steps so I don’t have this problem again. Let me know if there are any steps that you feel need added as you try this.