Analyzing Spread Football Picks With R

I’ve been making an effort to learn R for about a year. I have experimented with it on and off over the years, but this is first serious effort I’ve been making.

Whenever I am learning something, rather than just focusing on book examples, I try to come up with an example that is relevant to me and interesting. Doing that helps keep me motivated, and drives me to pick the things I want to know that are useful, and not just focus on the things that are revealed through examples. I would liken this to an experiment I heard a Khan Academy engineer talking about where students are exposed to various Logo drawings. Some of which have the source code available and some don’t. The ones without source serve as motivation and focus on principles that will build and challenge on what the student already knows.

In my case this means the following: if I’m going to work with Linear Models in R, I’m not just going to work with an example that lends itself to that data, but to be challenged to evaluate with variables might make a valid model and then test that fit with a critical eye.

In my case, I decided to try to not be the worst in my NFL Pick’em league this year. I usually do ok in the league, but I’m having a particularly bad year. The premise of the league is as follows. This league is only about picking game results, not like fantasy football. You pick every game each week, and you pick against the spread. The most correct picks win.

For those who don’t know what a spread is: it’s a gambling mechanism to get people to bet on both sides of a game. Bets are like stock purchases, you may not think about it every time you make a transaction but there needs to be someone taking a position on the other side. Many people assume the casino (or bookmaker) is taking the other position. They are trying not to take a position, they are really just a market-maker. The bookmaker attempts to make money by having a small profit margin (sometimes called overround) on the bets. In order to not take a position, they want as close to 50% of the betting population on each side of the bet. That way, each winner is paid using the losses of a loser. In order to accomplish that, they use a spread, or payout odds. In the case of a spread, they subtract a certain number of points from the favorite, to entice people to bet the underdog. If Denver plays Oakland, and the spread is Denver – 10, the bookmaker is saying that by subtracting 10 from Denver’s score, they think they will get an even market. If Denver wins by more than 10, the Denver betters are right. If they win by less than 10 or lose the game outright, the Oakland betters win. If Denver were to win by exactly 10, the bet is a push, and both sides get their original bet back.

Our league is not a gambling league in the sense of betting per game. You just pick all the games, and there is a prize at the end of the year for the most correct picks. It is run by a friend and I have been in the league for over 10 years now. So needless to say, I know the domain. Which in doing analysis is a huge leg-up. You can intuit pretty quickly if numbers look correct, or if a stat has meaning.

So the first step was to track results. I used google docs to keep track of my picks. It has an option to download spreadsheets as a csv file, which is a very friendly format for R to work with. If you want to try this with your picks, you can make a copy from here.

Now comes the R work. All the code is up on my github account. The first step was to get the data into a data frame, one of R’s most common structures. Picks.R does just that, and adds some calculated columns and gets calculates some general league trends. I wrote two functions condition_frequency and condition_percentage that can calculate almost all the required stats. They are functions that count the number of occurences of some condition, or a percentage. Both take functions for the condition, and can look at all picks, or be passed another function that is used to determine a subset to analyze. For instance, you can calculate the percentage of home teams that cover the spread when they are favored by passing a set condition that looks for results where the home team is favored, and a subset condition of the home team winning by more than the spread.

Output

Describe.R writes a markdown file that can produce html to show league trends and personal trends. The result looks like:

Next I decided to plot my results by week. The results are:

My results by week

You can see I tried to apply a simple linear model to the results based on how many weeks of football I had to project how much better picking would get. That’s a questionable model to try, but it at least demonstrates your general trend.

In Teams.R there is a function unplayed_games that will give you relevant stats about each team in the games that don’t have scores yet.

So what did I learn?

I learned to use functions very effectively in R, and to try to take advantage of the way you can operate on entire vectors at the same time. (Data frame columns are vectors). I learned to work with Hadley Wickham’s dplyr and ggplot2 libraries, which are great for productivity once you understand the philosophy of how to work with those libraries.

A lot of the visual and transformation work was helped by a workshop the Cleveland R User Group held with Robert Kabacoff. He was a very good instructor and it really put a lot of pieces together for me about working with R.

What Next?

I’d like to get into clustering the data, and seeing how results vary by spread size. In addition, I’d like to try some machine learning. Train up models and see if the machine can predict better.

In particular, I’d like to bring team popularity into the model. Why? Remember the long-winded discussion of how and why bookmakers make spreads? Did you notice that the bookmaker isn’t trying to predict the most accurate line, they are trying to get 50% of the betters on each side. That means that there are opportunities for exploitation. The common example is large market (or popular) teams. Consider the Pittsburgh Steelers (which as a Bengals fan, I of course loathe but that is not the point…): The steelers have backer groups across the country and a huge following. If they were to play a team like Jacksonville that struggles to sell out their tickets, it is likely that there is a certain base that is going to bet on the Steelers simply because they are fans. In order to achieve that 50% balance, bookmakers are likely to skew the spread to overly favor Jacksonville. To make the less popular team a more attractive bet. Savvy data driven pickers end up taking the mathematical advantage at the expense of betters just playing favorites.

Also, I’d like to investigate ways to make the entire app more approachable. Could this be a shiny app that takes a url to a csv and present the user with results?

It’s been a fun project, and I’ve seen some improvement over the year. That said, I’ve had a rough picking year and certainly won’t finish in the money. But it’s kept my R learning journey moving along, and I’ve enjoyed it.

VicinityBuzz Update: Windows Phone 8 & More

VicinityBuzz on Win Phone 8

While attending Codemash a few weeks ago, I ended up in a Windows Phone development precompiler (Codemash’s name for a training session). It was my plan to hit mostly mobile and analytics sessions, but I was not originally planning on attending this session. With Windows Phone still struggling for market share, I wasn’t in a rush to work with it. However, other sessions were cancelled because weather had delayed some presenters, so I ended up in this session. Microsoft’s Jeff Blankenburg was teaching the session, and I have enjoyed some of his presentations and a Silverlight fire-starter event in the past. It’s one of my rules of conferences to attend sessions based more on good speakers, rather than based solely on topic.

With regards to marketshare, Jeff made the point during the session that with a less crowded app store, you do have a bit more discoverability. Even if that doesn’t hold up, the platform shares enough similarity with Windows 8 that a port to the Windows Store will be trivial. The Windows App Store isn’t exactly setting the world on fire either, but I’d like to see my app on all of these platforms, and as Windows 8 adoption rises with new machine sales, that marketplace should see constant upticks.

Having worked with Silverlight in the past, I found it pretty easy to get going on Win Phone 8 development. There was some definite rust on my XAML skills, but it came back to me fairly quickly. One thing to keep in mind is that you want to keep things relatively simple on a mobile platform. I have worked on some WPF projects in enterprise settings with MVVM frameworks, dependency injection frameworks, and more. While I followed an MVVM pattern, I just rolled my own with a simple base class.

My project was to do a version of an app I already have in the iOS App Store, VicinityBuzz. It does location based searches of twitter. You can search around you, or by entering an address. The radius is a configurable setting. I like using the app at conferences like Codemash to catch all the chatter that may not have a hashtag. One catch is that obviously only tweets that included location will be found. If folks have that feature turned off in their twitter app, then it won’t show up.

Since I had written the app before (in phonegap for iOS), I knew the feature set and domain cold. The challenge was just getting up to speed with the latest API’s for search and geolocation, and then implementing within a new platform. One of the biggest benefits of this project was getting up to date with the latest Twitter API. I still need to update the version for iOS, as it’s currently non-functional because of api changes over the last several years. I plan on doing that very soon now that I know the latest version.

Anyway, I won’t go into the development details here too much, but I finished a version 1 of VicinityBuzz, and it is now in the windows phone store here, and it’s free. So go check it out. If you like it, I’d love to have some more reviews.

Also, if you are inspired to do any Windows Phone development yourself, you may be interested in a device to do some real testing. I recently found there are some prepaid phones new on Amazon that are dirt cheap for that purpose. Check out the Nokia Lumia 520 and Nokia Lumia 521 on Amazon.

Watch this blog for upcoming posts about working with the Twitter API, and some of the things I learned working with Windows Phone 8. And more mobile in general. I have the bug again…

Ownership: Artist or Engineer?

There are varying ways that a client / service relationship can work, and this view can be the cause of harmony or discord in projects. It’s something I’ve understood for a while, but had trouble explaining at times. What follows is an attempt to explain my view of something that knowledge workers need to understand to have a successful and fulfilling career.

The Artist

When an artist does work, we typically think of them as having complete control of the work. This is certainly true in the case where the eventual owner is not yet determined. Meaning an artist creates, and then sells directly or via a gallery to the public. Even commissioned artwork has only varying levels of control. A commissioned portrait certainly falls into the realm of work that would come with constraints. After everyone involved is dead and gone, people tend to remember the artist, not the owner.

The Engineer

When work is not sponsored, but contracted, it begins to fall into a completely different classification. Though you may refer to it as “artwork”, a visual painting done for advertising is really more “creative” work for hire. Certainly work that is more functional (like a bridge) comes with rigid requirements. This is the attitude of the engineer. That work is asked to meet certain goals, and ultimately subject to the approval of the buyer. While the engineer is an expert who expects to provide guidance, but does so at the behest of the client. In terms of credit, this world is muddled. Sometimes it is the architect, other times it is the owner or visionary that is remembered.

What does any of this have to do with consulting, software development, functional work, or any other type of service industry? Don’t get caught up in the work medium (paint, steel, code), or left brain / right brain aspect of the work, but just consider the metaphor in terms of relationship to the client and ownership of responsibility. In my world (consulting in the areas of software development, creative, marketing, etc) this distinction makes a big difference. All of these types of work have a right-brained aspect to them. People think of software development as a very regimented thing, but there is a lot of freedom to work in your own styles and patterns. Languages are fairly abstract these days, and they are high level enough to provide nearly infinite ways to solve complex problems. Certainly I don’t need to discuss how easy it is for a graphic designer to relate to an abstract artist.

Here’s where the risk is of confusing yourself with having the control an artist does, when you do not. The consultant works in a world where the client owns the final product. And the project would never even exist but for the idea and financial commitment of the client. Deadlines, requirements and other constraints are all part of the context. “Ownership” of the direction is a privilege that goes to the person that takes the risk. While an artist that is producing works for sale is bearing the risk that the piece will not sell, the bridge builder is not. It was sold before the project started. Sure, many contracts include shared risk clauses, but for all intents and purposes, the risk is on the client.

It is arrogant and unproductive for a paid consultant to believe they have the final say in work. They have the right to turn down work. And some clients may choose to give control over to the consultant, but that is their choice, not a right of the consultant.

When a consultant is doing work for hire and thinks they have the privilege that come with the artist mentality, it easy to develop a disdain for the client. Whereas in the engineering mindset, it is understood that one can recommend to the client, but ultimately the idea must sell itself. There is no expectation that the consultant wins because they are the subject matter expert. They are expected to be able to convey their idea and it’s merits in plain English and at a level understood by someone outside the field. In short, they must sell the idea.

Am I saying that workers should not have principals? Not at all. Take the story of Howard Roark, the principal’ed architect of Ayn Rand’s The Fountainhead. Roark suffered while turning down business as he would only work on buildings where his style of architecture was called for and he had the free hand to build modern and innovative buildings. He turned down others with work that would not satisfy him, but he did so respectfully. In this fictional work Roark has many enemies, but not the clients he turns down. A read sympathizing with Roark may agree with those clients, but it’s hard to see them as malicious.

The problem is that so many give away the right to have those principles for the sake of expediency without realizing they gave up that control. So they still expect the control. As an example, coming to work for a consulting company, you give up that right. Why? I’ve already made the point that the client has control, so you can only control work by selecting projects that meet your criteria, or working with clients who are ceding control. But as an employee of the company, you are giving up the client selection privilege in order to minimize your risk and investment. The consulting company provides you with a salary, book of work, etc, in return for your work. You do not get to control what work the company takes or not. It is your choice to leave if their client base does not suit you.

Too easily do I see people in the business blame owners and clients for work they do not like. But owners and clients anted up. They paid for the right to call the shots. If you want them to call the shots differently, it is your obligation to sell them on those ideas. Not your right to complain if the idea is not sold.

So what should someone who believes in the artist mentality do? There are people that believe so strongly in the control of their output. Folks like this tend to idolize people they view as uncompromising, like a Steve Jobs. (It’s worth noting that there are a lot of signs that he compromised more than the public perceived).

There are some simple solutions. Being an independent consultant means that you can turn down work that does not fit preferred principles. Owning / building your own consulting firm allows you the control of selecting your projects and clients. The third option is to return to the artists root of non-commissioned work. In other words, start a product company. While your clients have say in the form of sales, you are free to put whatever product you see fit on the market that can be held to only your standards. For all the knowledge workers that idolize Steve Jobs and he uncompromising reputation, it is worth remembering that he started and returned to a product company.

Media is a great analogy. Being an independent musician, filmmaker, or game developer is hard. You have to market for yourself, sell each product, secure distribution, etc. But for that price, you have control of the product. Signing with a publisher makes those tasks easier, but you have to be aware of the conditions you agreed to with the publisher. Like any contract or relationship, you should consider what those terms will mean in good times and in bad.

This is not to say that the rest of us don’t have some control and determination of our own destiny. It’s just that you have to remember you only have one lever to pull, leaving the company. You can negotiate terms based on a company or client’s desire to have you be a part of the team. But to get emotional about it, or think you have a right to control without taking the primary risk is folly.

Find The Agile Signers On The Web

If you’re interested in seeing what the signers of the agile manifesto are up to these days, I’ve compiled a list with twitter and blog links below. If you use twitter lists, I’ve created on here.

If you have any corrections, please post in the comments below.

signer twitter blog github
Kent Beck @kentbeck blog kentbeck@github
Mike Beedle @mikebeedle blog  
Arie van Bennekum @arievanbennekum blog  
Alistair Cockburn @totheralistair blog  
Ward Cunningham @wardcunningham blog wardcunningham@github
Martin Fowler @martinfowler blog martinfowler@github
James Grenning @jwgrenning blog jamesgrenning@github
Jim Highsmith @jimhighsmith blog  
Andrew Hunt @pragmaticandy blog  
Ron Jeffries @ronjeffries blog  
Jon Kern @muddyallen blog jonkernpa@github
Brian Marick @marick blog marick@github
Robert C. Martin @unclebobmartin blog unblebob@github
Steve Mellor      
Ken Schwaber @kschwaber blog  
Jeff Sutherland @jeffsutherland blog drpentode@github
Dave Thomas @pragdave blog  

Working With Social Network APIs

Creating Vicinity Buzz naturally involved working with a the APIs of social networks. That information seemed worth sharing for those of you interested in writing any type of application that would integrate with a social network.

Developer Documentation

Any of the social networking sites you probably want to integrate with have developer api’s that are well documented. Here’s the starting points for a variety of services:

Working With JSON

All of these APIs are best used with JSON. If you’re not familiar, you can read up at json.org. It’s the notation for serialization of javascript objects, and object literals.

Where To Make the Call From

If you are working in a standard web page, you could call the api from document.ready (assuming you are using jquery). This is the approach I take on hoolihan.net, my personal homepage. There is a twitter feed on the right side.

If you have a bit more of an application, you may want to look at one of the many javascript frameworks that help you route events to actions. These are frameworks like backbone, knockout, spine, etc. There are also commercial variants like kendo, dojo, and sencha.

jQueryMobile is commonly paired with PhoneGap, and in that scenario, using something like backbone is a bit tricky. You may want to bring in a template binding library, but avoid routing.

Binding

jQuery.templates was one of the first good javascript template binders that I’m aware of, but there are now many different options. In the jQuery world, most of the momentum seems aimed at jsrender. Recently I’ve considered bring in knockout and only using the binding part, but I’m not far enough in to evaluate that direction.

API Keys

Unless you’re using the most basic parts of the API, you’ll probably need to register your app and get an API key. It’s a token that identifies your application. In the event of API abuse (too many calls, etc), they have information to contact you and analytics around the issue.

Open Authentication

This is a big topic, but if your application wants to use a social network to identify your users, this is possible via open authentication. If you are interested in this, get started here.

What Do You Think?

Are there any particular areas of the APIs that you’d like to see more detail about? Any conceptual parts that would warrant their own post? Let me know what you think below.