Hadoop: Installing on macOS

Hadoop is traditionally run on a linux-based system. For learning and development purposes, you may want to install hadoop on macOS.

This is the first in a series of posts that will walkthrough working with Hadoop and cloud-based storage.

First, you’ll want to use homebrew to install hadoop and any related tools you would like.
brew install hadoop apache-spark pig hbase

Next, you’ll want to setup some environment variables. This can be in your shell rc file (.bashrc, .zshrc), or other places if you use a shell config tool like oh-my-zsh.

Make sure you have set JAVA_HOME, which may differ from my setup below.

export HADOOP_INSTALL=/usr/local/opt
export HADOOP_HOME=$HADOOP_INSTALL/hadoop/libexec
export HADOOP_CONF=$HADOOP_HOME/etc/hadoop
PATH="$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH"

Then test your install with the following:

hdfs dfs -ls ~

You should see the contents of your home directory.

You can also run a hadoop example with:
yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100

You should see a (poor) estimate of pi.

You should now be set to use hadoop. In future posts we will look at using the S3 filesystem from AWS and the Google Cloud Storage as well.