I’ve been creating a video series on machine learning in R. Two videos are up and there is a third on the way.
The first video series is a Getting Started series that looks at predicting continuous values, classification, and other first steps into modeling. I start with using the algorithms directly, and finish with the caret package. It’s available here.
The second video series is picks up on more advanced algorithms and techniques, for example random forests, support vector machines, clustering and text processing. I tried to focus on fairly serious data sets that might resemble the type you would use in the real world. The second video series is available here.
Note that I’m using the tm package, which is the traditional way to work with a document collection in R. There are new ways like tidytext that are gaining popularity. I may do a follow up talk on that.
The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and then show the results of both strategies.
The numeric output will vary, but look something like:
Clustering is a useful technique for exploring your data. It groups records into clusters based on similar features. It’s also a key technique of unsupervised learning. The following is a simple example in R where I plotted the clusters and centroids.
The example uses the mtcars dataset built into R, which contains auto data extracted from Motor Trend Magazine in 1973-1974.
Clustering is done with the kmeans() function. Note that the graph is 2-dimensional, and I cluster by 2 features, but you could cluster by more features and project down to a 2-dimensional plane.
Here is a recent interview I did for CLK Tech. CLK Tech is a newsletter based out of Northeast Ohio, run by a couple of tech recruiters in the area. Topics span general career questions and data science in particular.
In addition, I’m busy with a project that I look forward to announcing soon. It’s shaping up to be a a busy year…