Category: Data Science

Hadoop: Installing on macOS

Hadoop is traditionally run on a linux-based system. For learning and development purposes, you may want to install hadoop on macOS. This is the first in a series of posts that will walkthrough working with Hadoop and cloud-based storage. First, you’ll want to use homebrew to install hadoop and any related tools you would like.…

2017.12.23
Files and Pipes in R Video Demo

I’ve worked with various alternate file handlers in python before and wanted to explore the options in R. I was pleasantly surprised to find handlers prebuilt for tasks like compressing data. In addition, a pipe function is available to allow you to use less common commands on your file, like gpg for encryption. I put…

2017.07.06
Simulating the Monty Hall Problem in R.

The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and…

2017.03.22
Clustering in R

Clustering is a useful technique for exploring your data. It groups records into clusters based on similar features. It’s also a key technique of unsupervised learning. The following is a simple example in R where I plotted the clusters and centroids. The example uses the mtcars dataset built into R, which contains auto data extracted…

2017.03.21
Interview and Upcoming Projects

Here is a recent interview I did for CLK Tech. CLK Tech is a newsletter based out of Northeast Ohio, run by a couple of tech recruiters in the area. Topics span general career questions and data science in particular. In addition, I’m busy with a project that I look forward to announcing soon. It’s…

2017.02.15