I’ve worked with various alternate file handlers in python before and wanted to explore the options in R. I was pleasantly surprised to find handlers prebuilt for tasks like compressing data. In addition, a pipe function is available to allow you to use less common commands on your file, like gpg for encryption.
I put together a quick video demo of how to use these functions, and it’s available on youtube:
The mean of means (of state e) is close to .36. If you take .3 * .36 + .4 * (1-.36), you get .364, so this seems to make sense. Note that I’m weighting the switching to e percentage based on the percentage of being in that state in the first place.
Note that I’m using the tm package, which is the traditional way to work with a document collection in R. There are new ways like tidytext that are gaining popularity. I may do a follow up talk on that.
The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and then show the results of both strategies.
The numeric output will vary, but look something like:
Clustering is a useful technique for exploring your data. It groups records into clusters based on similar features. It’s also a key technique of unsupervised learning. The following is a simple example in R where I plotted the clusters and centroids.
The example uses the mtcars dataset built into R, which contains auto data extracted from Motor Trend Magazine in 1973-1974.
Clustering is done with the kmeans() function. Note that the graph is 2-dimensional, and I cluster by 2 features, but you could cluster by more features and project down to a 2-dimensional plane.