Tim’s Blog
-
Text Processing in R Talk With the TM Package
I gave a talk at my local Cleveland R User Group about text processing and document vectorization. You can view the talk here: Note that I’m using the tm package, which is the traditional way to work with a document collection in R. There are new ways like tidytext that are gaining popularity. I may […]
-
Simulating the Monty Hall Problem in R.
The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and […]
-
Clustering in R
Clustering is a useful technique for exploring your data. It groups records into clusters based on similar features. It’s also a key technique of unsupervised learning. The following is a simple example in R where I plotted the clusters and centroids. The example uses the mtcars dataset built into R, which contains auto data extracted […]
-
Interview and Upcoming Projects
Here is a recent interview I did for CLK Tech. CLK Tech is a newsletter based out of Northeast Ohio, run by a couple of tech recruiters in the area. Topics span general career questions and data science in particular. In addition, I’m busy with a project that I look forward to announcing soon. It’s […]
-
Installing pymc on OS X using homebrew
I’ve been working through the following book on Bayesian methods with an emphasis on the pymc library: However, pymc installation on OS X can be a bit of a pain. The issues comes down to fortran… I know. The version of gfortran in newer gcc implementations doesn’t work well with the pymc build, you need […]
Got any book recommendations?