Tim’s Blog

  • Text Processing in R Talk With the TM Package

    I gave a talk at my local Cleveland R User Group about text processing and document vectorization. You can view the talk here: Note that I’m using the tm package, which is the traditional way to work with a document collection in R. There are new ways like tidytext that are gaining popularity. I may […]

  • Simulating the Monty Hall Problem in R.

    The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and […]

  • Clustering in R

    Clustering is a useful technique for exploring your data. It groups records into clusters based on similar features. It’s also a key technique of unsupervised learning. The following is a simple example in R where I plotted the clusters and centroids. The example uses the mtcars dataset built into R, which contains auto data extracted […]

  • Interview and Upcoming Projects

    Here is a recent interview I did for CLK Tech. CLK Tech is a newsletter based out of Northeast Ohio, run by a couple of tech recruiters in the area. Topics span general career questions and data science in particular. In addition, I’m busy with a project that I look forward to announcing soon. It’s […]

  • Installing pymc on OS X using homebrew

    I’ve been working through the following book on Bayesian methods with an emphasis on the pymc library: However, pymc installation on OS X can be a bit of a pain. The issues comes down to fortran… I know. The version of gfortran in newer gcc implementations doesn’t work well with the pymc build, you need […]

Got any book recommendations?