Author: Tim

Hadoop: Accessing Google Cloud Storage

First, go here to choose the hadoop google cloud storage connector for your version of hadoop, likely hadoop 2. Copy that file to $HADOOP_HOME/share/hadoop/tools/lib/. If you followed the instruction in the prior post, that directory is already in your class path. If not, add the following to your hadoop-env.sh file (found in $HADOOP_CONF directory): #GS…

2017.12.23
Hadoop: Accessing S3

This post follows in a series of doing local hadoop setup on macOS for development / learning purposes. In the first post, we installed hadoop. If you get stuck or need more detail, feel free to check out the apache docs on S3 support. First, we have to add the directory with the necessary jar…

2017.12.23
Hadoop: Installing on macOS

Hadoop is traditionally run on a linux-based system. For learning and development purposes, you may want to install hadoop on macOS. This is the first in a series of posts that will walkthrough working with Hadoop and cloud-based storage. First, you’ll want to use homebrew to install hadoop and any related tools you would like.…

2017.12.23
Files and Pipes in R Video Demo

I’ve worked with various alternate file handlers in python before and wanted to explore the options in R. I was pleasantly surprised to find handlers prebuilt for tasks like compressing data. In addition, a pipe function is available to allow you to use less common commands on your file, like gpg for encryption. I put…

2017.07.06
Markov Chain Simulation

I’ve been reading up on Markov chains and related concepts. On the wikipedia page there is an example of a 2 state Markov process. I decided to simulate it in R and plot the mean of the means. Quick Code example here: The mean of means (of state e) is close to .36. If you…

2017.06.30