The Math of Machine Learning

Matrix multiplication diagram
(hover for CC attribution)

One of the challenges of data science in general is that it is a multi-disciplinary field. For any given problem, you may need skills in data extraction, data transformation, data cleaning, math, statistics, software engineering, data visualization, and the domain. And that list likely isn’t inclusive.

One of the first questions when it comes to machine learning in specific, is “how much math do I need to know?”

This is where I would recommend you start, to get the most value for your time:

  • Matrix Multiplication (Subject: Linear Algebra)
  • Probability (Subject: Statistics)
  • Normal Distributions (Subject: Statistics)
  • Bayes Theorem (Subject: Statistics)
  • Linear Regression (Subject: Statistics)

Of course you will run across other math needs, but I think the above list represents the foundation.

If you need places to get started with those topics, check out Kahn Academy, Coursera, or your location library.

For more on machine learning, check out other posts such as ML in R, Linear Algebra in R, and ML w/XGBoost.

Installing TensorFlow on CentOS

Google released TensorFlow as open source for community use and improvement. From the site: “TensorFlowâ„¢ is an open source software library for numerical computation using data flow graphs.”

The instructions on tensorflow.org are aimed at Ubuntu and OS X. I had a need to install it on CentOS so I documented the steps in a github gist. Feel free to comment if you find something I missed:

* Updated 8/18/2016 for TensorFlow 0.10

Presentation on Linear Algebra in R

At our January meeting, I presented on Linear Algebra basics in R. I have been taking the Andrew Ng’s Stanford Machine Learning course. That course primarily uses Matlab (or Octave, and open source equivalent), and machine learning involves manipulating and calculating with matrices. Naturally, being an R person, I have been working with some of the techniques in R.

In order to limit the scope of the talk, I focused on matrices, vectors and basic operations with them. There is a practical example that uses a machine learning algorithm, but it’s just to show how R handles a more involved equation with matrices. The talk is not an attempt to teach machine learning.

The slides are available here, and comments or suggestions are welcome.