R Machine Learning Tutorial Videos Published

I’ve been creating a video series on machine learning in R. Two videos are up and there is a third on the way.

Getting Started

The first video series is a Getting Started series that looks at predicting continuous values, classification, and other first steps into modeling. I start with using the algorithms directly, and finish with the caret package. It’s available here.

A screen capture of the course
A screen capture of the course

Advanced Algorithms

The second video series is picks up on more advanced algorithms and techniques, for example random forests, support vector machines, clustering and text processing. I tried to focus on fairly serious data sets that might resemble the type you would use in the real world. The second video series is available here.

The third video series is still in production and will be available here when it launches.

Happy learning!

Thoughts On Leadership and Groups

Building software teams is hard. Fostering culture, improvement, learning and community in a group of individuals that have other options is a difficult thing to do. Fortunately, it’s not all that different than building many other kind of groups. Yet too often, we fail to look around at other successful groups and learn from their example. There are two challenges in particular that hold back leaders in their quest to build a highly functioning group.


First, creating success is not about the composition of the group alone. Don’t get me wrong, you should look for A-Players, but in any environment where there are other groups people will be more or less evenly distributed by the draw of money, opportunity, ego-stroking, etc. And what will differentiate your group is what it can do with it’s B and C-players. Your A-Players are already good, and mostly know how to handle themselves. Leadership might provide them some marginal returns. But compare that to the return of turning a C player into a B player with proper support. Yet how many of us see leaders trying to run off everyone but A-Players in the naive belief that they can build a team of only A-Players. Those teams don’t exist, and if they do, they don’t need leadership.

This is why some really successful leaders are viewed as simplistic, or unintelligent. Let me cite some examples. Jim Tressel had great regular season success with the OSU Football team, but struggled at times to win the big game. This was often chalked up to inferior coaching strategy. Critics said he was too conservative with play-calling, and that he harped on the basics instead of explosive plays. He was playing the numbers. He made some flawed teams better by reducing mistakes. There may be some truth that he could have opened up the playbook more and found some new strategies for the big games, but ultimately I think the key differences in the conferences that have plagued most outside of the SEC showed up those games. Urban Meyer never had that reputation at Florida, but is now turning out mostly the same results.

Another example is the financial advisor Dave Ramsey. He is trying to lead financial change across huge groups of readers and people attending classes in their community. His system is very simple and is critiqued for that lack of sophistication. But he has succeeded in helping millions of people (who span across a wide range of intelligence and financial knowledge) out of debt. I challenge the best financial Wall Street consultant to do the same. He tuned the message to the B & C players. The folks who were the most capable were on there way to success when they picked up a book and started thinking about fixing their finances and doing some basic tracking and planning. That was the nudge they needed. And the folks who wouldn’t understand a financially complicated plan got something they could digest that was better than what they are doing today.


Second, creating success is not a set of linear steps that reaches a done phase. Leaders lay out short-term plans that are focused on fixing all the problems, and then thriving in some utopian state. Like creating a group is like a construction problem, when in fact, it’s more like owning the property. You have to maintain the building. You have to weed the garden. You have to pay the bills…

Leadership is a repetitive job, where you will fix the same problem more than once, and you can’t get impatient about that. You will tread over the same ground, sometimes with the same people. If this doesn’t make sense, go ask a minister when his or her church will be “done”. Ask them when the last time they will need to do a baptism is, or when the last time is they will need to comfort a grieving family. Ask a coach when his or her team will be finished and perfect.

Who Should Lead

Now we come to some of the biggest sources of confusion. The mistaken belief that the best welder should lead the welders. The best burger flipper should manage the restaurant. The best programmer should lead the team. How many times have you seen that tried and failed.

Leadership is a service job. It is taking responsibility and solving problems. It is building community and motivating growth. Leading is about understanding people, and more importantly group interaction. Understanding the domain is usually the easy part.

Facebook Gender Analysis With R

I like working with social APIs, and have been working with R more lately. So I combined the two.

I saved the json results of asking facebook for all of my friends using the graph api.

Using the rjson package for R, I loaded the data into R and broke down the count of friends by gender. I then created a simple bar plot.

It wasn’t rocket science, but a fun project to toy around with and get to know manipulating data in R.


ISV Pricing

The other day I tweeted about something that has bothered me for a while:

[tweet http://twitter.com/thoolihan/status/236099028545318912]

I feel like it’s worth explaining my problem.

Software companies, particularly those that sell large enterprise software, want to get customers into their CRM systems and into the sales pipeline quickly so that regional sales folks can follow up. Pricing and other aspects of licensing are often customized to the industry and other needs of the customer.

While those arguments seem valid, I think it’s foolish to put up barriers that keep customers away. In the course of evaluating software, often times you just want a classification. What price range is the product and what is the licensing model (one time? yearly?) so that I can compare it with other software.

Let’s go through some of the common arguments:

  1. No one pays that price, so why lead with it?

    Who pays full price for a car? Yet I can look up MSRP on the manufacturer’s site, cars.com, or on any number of other web sites. It helps me determine that a Hyundai Tiburon is competing with a Mitsubishi Eclipse, not with a Porsche.

  2. Only enterprise customers are using these types of software, and those purchasing managers know how to get in touch with big ISVs.

    First, that’s just not true. Take Oracle for example. If only Enterprise customers matter, then why is it available on Amazon’s EC2 model? Oracle has migration tools from MySQL, various licensing models, and all aimed at bringing smaller customers into using Oracle 11g. So why not let them see what the tiers are?

    Second, Enterprise customers are often collaborating with service and consulting firms (like mine), why can’t we get a quick view of pricing in order to make recommendations? Sometimes we want to partner with an ISV, but often times, there is no need. So if I sign on to your sales form, then I’m getting called about purchasing your product when I’ll never purchase it.

    This is a really important point. You might expect the consultant to put the leg work in anyway. If this is a software selection process, then yes, we’re expected to deal with the sales team if necessary in order to get 100% accurate licensing cost. But on a project with a tight deadline where a consultant is recommending many different tools to put together and do the job, they will often not dig that far into the possibilities. So they are making recommendations to an Enterprise customer based on things like: old pricing, forum posts, and reputation. All because the vendor won’t post a pricing sheet. I’d rather post a price matrix with higher prices with some large clauses like “negotiable” than have people running around deciding on my software based on guesses.

  3. There are drastically different programs available for startups and we don’t want to cause confusion

    This one has some merit, but I still say post a price sheet. Put a big link at the top for other paths. For instance, new companies going with Microsoft technology have the bizspark program available. Above a SQL Server price sheet, I would put a big bold link that says something like “If your company is new, click here to learn how you may be eligible for free Microsoft licensed products for several years”.

  4. Our licensing model is too complicated.

    This is the only reason I see valid. Some of the models (take IBMs PVU pricing) probably just complicate things too much to post a simple matrix. That said, if you’ve reached that point, I would consider simplifying.

So what about my company, we don’t post any prices for our services. (This is my personal blog, I’m not speaking for the company, yada yada yada…). That said, custom professional services is much different than being an ISV. Price matters, but we are usually evaluated on our ability to do a job, and how well we will do it. Additionally, there are so many variables and different models for how we right contracts, and it’s usually per the clients preferred model, that pricing sheets are rather useless.

But it’s a fair point that some ISVs may see themselves that way. I just think it’s a mistake. At the end of the day, developers and architects are looking to classify your product by price and features. Are you going to provide them that info? Or is a forum post / tweet going to provide them some info that may or may not be correct? You decide.

Fixing Canon MP Navigator with Windows 7 x64

I installed a scanner / printer directly via usb on a windows machine at my new home. At my previous home, it was hooked to a time capsule and I would plug it into my mac to scan. Canon has some software “MP Navigator” that does the scanning and handily puts the scans into documents. I was able to use it to print via any program, and scan via windows fax & scan, but the MP Navigator for Windows 7 wouldn’t work.

Why does it matter? Windows fax and scan is nice for sending faxes via a modem, but it’s scanning is very limited. MP Navigator, for instance, will allow you to scan multi-page documents in and save them as a single pdf file.

After googling around a bit, it appears many other users were having this problem on Windows 7, and x64 in particular. I went to the good old compatibility tab of the icon, and voila, things work. This works 9 times out of 10 when a program doesn’t play well with Windows 7.

The settings are as follows: