Hadoop: Accessing Google Cloud Storage

First, go here to choose the hadoop google cloud storage connector for your version of hadoop, likely hadoop 2.

Copy that file to $HADOOP_HOME/share/hadoop/tools/lib/. If you followed the instruction in the prior post, that directory is already in your class path. If not, add the following to your hadoop-env.sh file (found in $HADOOP_CONF directory):

#GS / AWS S3 Support
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

Create a service account in google cloud that has the necessary Storage permissions. Download the credentials and save somewhere, in my case I renamed the file and saved it in .config/gcloud/hadoop.json.

Add the following properties in your core-site.xml:
<configuration>
<property>
<name>fs.gs.project.id</name>
<value>someproject-123</value>
<description>
Required. Google Cloud Project ID with access to configured GCS buckets.
</description>
</property>

<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
<description>
Whether to use a service account for GCS authorizaiton.
</description>
</property>

<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/Users/tim/.config/gcloud/hadoop.json</value>
</property>

<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
<description>The implementation class of the GS Filesystem</description>
</property>

<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The implementation class of the GS AbstractFileSystem.</description>
</property>

</configuration>

Note to change someproject-123 to your actual project-id, which can be found in the google cloud dashboard.

Now test this setup with:

hdfs dfs -ls gs://somebucket

Of course you’ll need to replace somebucket with an actual bucket/directory in your google storage account.

Now you should be setup to use S3 and Google storage with your local hadoop setup.

Hadoop: Accessing S3

This post follows in a series of doing local hadoop setup on macOS for development / learning purposes. In the first post, we installed hadoop.

If you get stuck or need more detail, feel free to check out the apache docs on S3 support.

First, we have to add the directory with the necessary jar files to the Hadoop classpath. In hadoop-env.sh (which is in the $HADOOP_CONF directory), add the following lines to the end of the file:

#AWS S3 Support
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

Make sure you have the following properties set in your core-site.xml file (which is in the $HADOOP_CONF directory).

<configuration>

<property>
<name>fs.s3a.access.key</name>
<value>KEY_HERE</value>
<description>AWS access key ID.
Omit for IAM role-based or provider-based authentication.</description>
</property>

<property>
<name>fs.s3a.secret.key</name>
<value>SECRET_KEY_HERE</value>
<description>AWS secret key.
Omit for IAM role-based or provider-based authentication.</description>
</property>

<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The implementation class of the S3A Filesystem</description>
</property>

<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3A</value>
<description>The implementation class of the S3A AbstractFileSystem.</description>
</property>

</configuration>

Note that you will need to replace KEY_HERE and SECRET_KEY_HERE with your actual S3 access keys. You can also set the appropriate environment variables with your keys. I put them in this file because I use multiple AWS profile using the configuration files, which is not picked up on by hadoop.

You can test access by using a public data set. For example, I tested with:

hdfs dfs -ls s3a://nasanex/NEX-DCP30

You should see the contents of that bucket, which includes 5 files.

Note the use of s3a in the protocol, this is the preferred provider over s3n and the deprecated s3.

In the next post, we’ll look at setting up google cloud storage in a similar manner.

Installing pymc on OS X using homebrew

I’ve been working through the following book on Bayesian methods with an emphasis on the pymc library:

However, pymc installation on OS X can be a bit of a pain. The issues comes down to fortran… I know. The version of gfortran in newer gcc implementations doesn’t work well with the pymc build, you need gfortran 4.2, as provided orignally by apple. Homebrew has a package for this.

I dealt with this before, but had problems again after upgrading to Sierra. So this time, I thought I’d document the steps so I don’t have this problem again. Let me know if there are any steps that you feel need added as you try this.

VicinityBuzz: Writing A Mobile App

This post is intended to go through the process of writing and publishing an app. It will be a high-level introduction. The idea is that future posts that will dig into specific topics in depth. This post will be updated to link to those future posts as an index of mobile development topics. The app I’m describing is VicinityBuzz and it’s available here.

Product Vision

While I had been thinking about writing a mobile application for a while, I needed an idea to stick with to completion. The app stores are relatively young, but they are fairly saturated, and bringing something new to the table was a challenge. Recently I read Eric Ries’ The Lean Startup and I do like the idea of MVP(Minimum Viable Product), so I didn’t feel the need to have a telephone book of requirements. I was just looking to have an original idea for the app stores.

In the case of VicinityBuzz, I have always felt that there are not enough good ways to search social media, particularly by location. There are social media sites like FourSquare solely dedicated to location, but I’m referring to something else. Using networks where content is the reason (not location) but by location. For instance, at a conference, users often hashtag their tweets with the name of the conference. But not always. And the hashtag will draw attention from people not at the conference. That’s fine, and there can be useful conversation by just using the hashtag, but what if I want to see posts of all users that are actually at the location.

app screenshot

This is particularly true of news events. On the day of the shootings at Chardon High School, I used the application to see what students and families in the area were posting about. They were each telling their own version of the story. It was heartbreaking and much more informative and up to date than the news being released to the public. Hashtags were useless in this case, for two reasons. First, the students weren’t hashtagging their posts, they didn’t need to. They were posting to their real-life friends and thus already connected on twitter. Second, the tag was full of a lot of the reaction from the public around the globe. That was interesting too, but it wasn’t was I looking for. I wanted facts and eye-witness accounts. It was like CNN’s iReport, but on steroids.

Platform Selection

Native or not? As a web developer, I chose PhoneGap. It’s a mobile platform framework that uses html and JavaScript, which are skills I already have. And the application didn’t have very many device needs that would require native access. GPS is the only device call I make, and that is easily accessible from the PhoneGap JavaScript API. I knew it would be faster than for me to learn the ins and outs of the Cocoa framework and Objective-C, although I am in the process of doing that now for future apps. I have written some small applications in the native iOS sdk, and even paired with some developers to help bootstrap their iOS work, but I just didn’t feel comfortable writing an app consumable by the general public yet. Along with PhoneGap, I’m using jQuery mobile.

Essentially PhoneGap provides project templates for your platform (Android, iOS, WP7, and more) that create a shell of an application that hosts a browser control. All of your files go in a directory (like resources) and get loaded up by the browser control.

app screenshot featuring controls

Using PhoneGap gives me a version that is easily ported to Android and WinPhone 7. Although those markets lag Apple’s AppStore in terms of developer revenue, I want to cast the widest net possible with this application for obvious financial reasons, and also to learn. The more feedback I can get on the application, the better I can make it.

At some point I may go ahead and do a native rewrite. But for now, I’m very happy with what is possible in PhoneGap and jQuery mobile.

Prototyping

I started development with jQuery UI in a browser. Using the built in JavaScript debugger in Chrome was easy. Note you could just as easily use Firebug in Firefox, or the IE Developer tools. Once I was satisfied with functionality, I swapped jQuery mobile in place of jQuery UI and moved the app into the PhoneGap container. More on that in the next section.

One note: If I had to do this over again, I would have simply started with jQuery mobile. It’s not that hard to get started with as the docs are great, and it’s best to just start down that path.

App Store Submission

Thus far, I can only speak to the Apple Store submission process. After going through the steps provisioning portal, it was fairly easy to test the app on a device. In order to submit to Apple, you then create a profile for the application (screenshots, contracts, payment details, etc) and submit a signed binary to Apple for review. The process of signing a binary for release can be a little tricky, but the walk-through on Apple’s developer portal helps a lot.

One thing to note: Apple requires you have a support page. I stole an idea from the Trello folks, and used a public trello board for this purpose. It took less than 5 minutes to setup.

Marketing

I created a basic web page with a description of the application. Apple has a page for making the link to their store. If you are interested in opening the store automatically, either see the options in this stackoverflow post, or add the query paramter “ls=1” to the url you get from the link maker.

What’s Next?

Apple just announced “the new iPad” so I have to do some updating to support that. I haven’t read the new docs, but I assume this means a high-res iPad splash screen and icon. All of my other assets are web assets and scale accordingly.

My next steps are to add a simpler user workflow, a refresh button to update your current search, and add some social networks like Facebook and Instagram.

After that, I’m going to start porting to Android and WinPhone 7. Since the application is already written, it should be as simple as getting a PhoneGap project running for each environment. I have also considered the Mac App Store and the upcoming Win 8 Marketplace for desktop application versions.

Finally, I’ll blog more in depth about some of the topics covered, and provide links to those posts on this page. Mobile development is rewarding, but can be challenging for those coming from a web or desktop background. I’d like to pass on some of the things I learned so that others working in mobile can save some time.

Feel free to provide feedback and questions if you have something in particular you’d like me to cover. And checkout the application!

The Sun Also Rises (on the Technology Coast)

One generation passeth away, and another generation cometh: but the earth abideth for ever. The sun also ariseth, and the sun goeth down, and hasteth to his place where he arose. -Ecclesiastes 1:5-6

Apple released Mac OS X in 1999, using a BSD core and embracing many libraries and utilities of the open source world.  This excited open source advocates, and appeared a step in the right direction compared to Microsoft’s strict licensing and Sun’s Java licensing. 

Here we stand today, and the Apple iPhone is one of the most locked-down computers around.  Make no mistake, the iPhone is a computer.  Sun has open-sourced Solaris and Java.  Let’s not overstate the Microsoft moves, but they have made moves to interoperate with open source libraries, and are releasing source of many new products on codeplex.  Windows is still locked down, office docs in strict formats, and the open source stuff is in a whacky license.  But it’s a start.

What will the next 10 years bring?