Files and Pipes in R Video Demo

I’ve worked with various alternate file handlers in python before and wanted to explore the options in R. I was pleasantly surprised to find handlers prebuilt for tasks like compressing data. In addition, a pipe function is available to allow you to use less common commands on your file, like gpg for encryption.

I put together a quick video demo of how to use these functions, and it’s available on youtube:

If you are having a hard time reading the text, click here to view the video directly on youtube.

Comment here or on the video with any feedback or questions.

Markov Chain Simulation

I’ve been reading up on Markov chains and related concepts. On the wikipedia page there is an example of a 2 state Markov process. I decided to simulate it in R and plot the mean of the means.

Quick Code example here:

The mean of means (of state e) is close to .36. If you take .3 * .36 + .4 * (1-.36), you get .364, so this seems to make sense. Note that I’m weighting the switching to e percentage based on the percentage of being in that state in the first place.

R Machine Learning Tutorial Videos Published

I’ve been creating a video series on machine learning in R. Two videos are up and there is a third on the way.

Getting Started

The first video series is a Getting Started series that looks at predicting continuous values, classification, and other first steps into modeling. I start with using the algorithms directly, and finish with the caret package. It’s available here.

A screen capture of the course
A screen capture of the course

Advanced Algorithms

The second video series is picks up on more advanced algorithms and techniques, for example random forests, support vector machines, clustering and text processing. I tried to focus on fairly serious data sets that might resemble the type you would use in the real world. The second video series is available here.

The third video series is still in production and will be available here when it launches.

Happy learning!

Text Processing in R Talk With the TM Package

I gave a talk at my local Cleveland R User Group about text processing and document vectorization. You can view the talk here:

Note that I’m using the tm package, which is the traditional way to work with a document collection in R. There are new ways like tidytext that are gaining popularity. I may do a follow up talk on that.

Feedback, and More Videos

Enjoy, and feedback is welcome! And if you are interested in more video content on machine learning in R, check out this post.

Simulating the Monty Hall Problem in R.

The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and then show the results of both strategies.

Simulating Monty Hall in R
Simulating the strategies of Monty Hall

The numeric output will vary, but look something like:

> print(summary(games$strategy) / nrow(games))
stay switch
0.342 0.658

The following code does this in a rather short R example: