Numpy’s clip function is a handy function that brings all data in a series into a range. For example, in machine learning, it is common to have activation functions that take a continuous range of values and bring them to a range like 0 to 1, or -1 to 1.
In my case, I was working on some quality control charts that have control limits. In this case, control limits could be calculated to be below 0, but should never be. I was working in R. R has a function max() that can take two values and return the max. So by applying that function over a series, you make make sure each value was at least 0. But it felt a bit cumbersome compared to clip. min() and max() are not vectorized in R. That makes sense, because you may want the max or min of the entire series, so it makes sense that you can pass a series or many values, and get a single answer.
Note that in my use case, the second value is a scalar. But it could be a vector of the same size.
I discovered the pmin and pmax functions which can clip a single bound, or be combined to approximate the clip function. Here’s a gist showing plots of how this works:
The mean of means (of state e) is close to .36. If you take .3 * .36 + .4 * (1-.36), you get .364, so this seems to make sense. Note that I’m weighting the switching to e percentage based on the percentage of being in that state in the first place.
One of the challenges of data science in general is that it is a multi-disciplinary field. For any given problem, you may need skills in data extraction, data transformation, data cleaning, math, statistics, software engineering, data visualization, and the domain. And that list likely isn’t inclusive.
One of the first questions when it comes to machine learning in specific, is “how much math do I need to know?”
This is where I would recommend you start, to get the most value for your time:
Matrix Multiplication (Subject: Linear Algebra)
Probability (Subject: Statistics)
Normal Distributions (Subject: Statistics)
Bayes Theorem (Subject: Statistics)
Linear Regression (Subject: Statistics)
Of course you will run across other math needs, but I think the above list represents the foundation.
If you need places to get started with those topics, check out Kahn Academy, Coursera, or your location library.
In order to limit the scope of the talk, I focused on matrices, vectors and basic operations with them. There is a practical example that uses a machine learning algorithm, but it’s just to show how R handles a more involved equation with matrices. The talk is not an attempt to teach machine learning.