First, go here to choose the hadoop google cloud storage connector for your version of hadoop, likely hadoop 2.
Copy that file to $HADOOP_HOME/share/hadoop/tools/lib/. If you followed the instruction in the prior post, that directory is already in your class path. If not, add the following to your file (found in $HADOOP_CONF directory):
#GS / AWS S3 Support
Create a service account in google cloud that has the necessary Storage permissions. Download the credentials and save somewhere, in my case I renamed the file and saved it in .config/gcloud/hadoop.json.
Add the following properties in your core-site.xml:
Required. Google Cloud Project ID with access to configured GCS buckets.
Whether to use a service account for GCS authorizaiton.
<description>The implementation class of the GS Filesystem</description>
<description>The implementation class of the GS AbstractFileSystem.</description>
Note to change someproject-123 to your actual project-id, which can be found in the google cloud dashboard.
Now test this setup with:
hdfs dfs -ls gs://somebucket
Of course you’ll need to replace somebucket with an actual bucket/directory in your google storage account.
Now you should be setup to use S3 and Google storage with your local hadoop setup.