Category Archives: R

Equivalent of Numpy’s Clip function in R

Numpy’s clip function is a handy function that brings all data in a series into a range. For example, in machine learning, it is common to have activation functions that take a continuous range of values and bring them to a range like 0 to 1, or -1 to 1.

In my case, I was working on some quality control charts that have control limits. In this case, control limits could be calculated to be below 0, but should never be. I was working in R. R has a function max() that can take two values and return the max. So by applying that function over a series, you make make sure each value was at least 0. But it felt a bit cumbersome compared to clip. min() and max() are not vectorized in R. That makes sense, because you may want the max or min of the entire series, so it makes sense that you can pass a series or many values, and get a single answer.

Note that in my use case, the second value is a scalar. But it could be a vector of the same size.

I discovered the pmin and pmax functions which can clip a single bound, or be combined to approximate the clip function. Here’s a gist showing plots of how this works:

To see the plotted output, click here.

StirTrek 2018 Talk: Machine Learning in R

I had the chance to speak at StirTrek 2018 about Machine Learning in R. I have been to StirTrek before, but it’s been a few years. The conference has really grown, as there are over 2000 attendees now.

I was in the 3:30 timeslot. I talked in a full theater and they broadcast the talk to two other theaters. I don’t know what attendance was like in the overflow rooms. Most of the follow up questions were from developers looking for resources to get started, tutorials, etc. It seemed like a sign that attendees were interested in going further, which was the point of the talk.

Start of the Talk Agenda

The organizers did a great job. I had a helpful proctor who notified about time, and made sure I was setup and informed.

Regression as an intro to modeling

The talk will go up later this month on YouTube, and I’ll add it to the blog. Thanks to all who attended, and a big thanks to all who helped organize, sponsor, and volunteered for the conference.

Files and Pipes in R Video Demo

I’ve worked with various alternate file handlers in python before and wanted to explore the options in R. I was pleasantly surprised to find handlers prebuilt for tasks like compressing data. In addition, a pipe function is available to allow you to use less common commands on your file, like gpg for encryption.

I put together a quick video demo of how to use these functions, and it’s available on youtube:

If you are having a hard time reading the text, click here to view the video directly on youtube.

Comment here or on the video with any feedback or questions.

Markov Chain Simulation

I’ve been reading up on Markov chains and related concepts. On the wikipedia page there is an example of a 2 state Markov process. I decided to simulate it in R and plot the mean of the means.

Quick Code example here:

The mean of means (of state e) is close to .36. If you take .3 * .36 + .4 * (1-.36), you get .364, so this seems to make sense. Note that I’m weighting the switching to e percentage based on the percentage of being in that state in the first place.

Text Processing in R Talk With the TM Package

I gave a talk at my local Cleveland R User Group about text processing and document vectorization. You can view the talk here:

Note that I’m using the tm package, which is the traditional way to work with a document collection in R. There are new ways like tidytext that are gaining popularity. I may do a follow up talk on that.

Feedback, and More Videos

Enjoy, and feedback is welcome! And if you are interested in more video content on machine learning in R, check out this post.