Category: Data Science
-
Hadoop: Installing on macOS
Hadoop is traditionally run on a linux-based system. For learning and development purposes, you may want to install hadoop on macOS. This is the first in a series of posts that will walkthrough working with Hadoop and cloud-based storage. First, you’ll want to use homebrew to install hadoop and any related tools you would like. […]
-
Files and Pipes in R Video Demo
I’ve worked with various alternate file handlers in python before and wanted to explore the options in R. I was pleasantly surprised to find handlers prebuilt for tasks like compressing data. In addition, a pipe function is available to allow you to use less common commands on your file, like gpg for encryption. I put […]
-
Simulating the Monty Hall Problem in R.
The Monty Hall Problem is famous in the world of statistics and probability. For those struggling with the intuition, simulating the problem is a great way to get at the answer. Randomly choose a door for the prize, randomly choose a door for the user to pick first, play out Monty’s role as host, and […]
-
Clustering in R
Clustering is a useful technique for exploring your data. It groups records into clusters based on similar features. It’s also a key technique of unsupervised learning. The following is a simple example in R where I plotted the clusters and centroids. The example uses the mtcars dataset built into R, which contains auto data extracted […]
-
Interview and Upcoming Projects
Here is a recent interview I did for CLK Tech. CLK Tech is a newsletter based out of Northeast Ohio, run by a couple of tech recruiters in the area. Topics span general career questions and data science in particular. In addition, I’m busy with a project that I look forward to announcing soon. It’s […]