At our January meeting, I presented on Linear Algebra basics in R. I have been taking the Andrew Ng’s Stanford Machine Learning course. That course primarily uses Matlab (or Octave, and open source equivalent), and machine learning involves manipulating and calculating with matrices. Naturally, being an R person, I have been working with some of the techniques in R.
In order to limit the scope of the talk, I focused on matrices, vectors and basic operations with them. There is a practical example that uses a machine learning algorithm, but it’s just to show how R handles a more involved equation with matrices. The talk is not an attempt to teach machine learning.
The slides are available here, and comments or suggestions are welcome.
I have been buffing up on some areas of Math that I felt rusty in. One of the tools I was using was my old TI-82 calculator I used in high school.
I even got the TI Connect software working where I could download screenshots, etc.
You can put in data sets (see screen cap below), and do some graphing and some statistical analysis.
However, mapping functions is a more straight forward use of the calculator. Which got me thinking… How does one do that in R?
I did a little digging. With the traditional R graphics and plotting functions, you would use curve() to draw a function. It works fine, but I like to use ggplot2 when I can.
Turns out ggplot2 supports this well. The following code sample maps the same functions I was mapping on the calculator (sin, cos, and tan from 0 to 2pi radians).
And the resulting graph is…
Related to my last post, I had a bug to fix. The images looked rotate 90 degrees locally, so I used Imagemagick’s convert command to rotate them 90 degrees. They looked right locally, and on the sight when viewed in chrome.
However, iOS was showing the images over-rotated by 90 degrees. It turns out after some digging that the pictures were taking on an iOS device and had saved Orientation data in the EXIF data of the image (metadata). The iOS browser was still honoring that metadata.
Because metadata also has date, time, and location info, a lot of people prefer not to publish EXIF data anyway. Photoshop offers options to remove this data when exporting. I believe GIMP may as well.
But I just wanted to make it part of my process. So I updated the Rakefile from the last post with the following command option from Imagemagick’s convert:
convert input.jpg -strip output.jpg
All is well now. Use your favorite EXIF viewer to confirm success.
I help maintain a site for a craft business. One of the challenges is multiple sizes of images for new stock each year. I used to go through and resize each image manually with a program like Gimp.
I finally got smart and installed ImageMagick.
With a Rakefile, I can pass each image into the convert command with the proper sizing, and get the output into a new folder, ready for upload. The code is shown here:
One catch to keep in mind if you work on a Mac. There can be orientation metadata from some images that OS X will honor, but a browser will not. If your images show up on the web in the wrong orientation, look into ImageMagick convert command’s -rotate option.
It is common for technical product companies to call themselves “data-driven” these days. The idea is that metrics are used to drive decisions. Sounds easy enough, and compatible with a technology landscape that is enamored with data science, etc.
But something didn’t always feel right to me. Strange, right? If you follow this blog or know me, you probably know that I have been steering my career in a data-centric direction. I coordinate the Cleveland R User Group, and have spent most of my personal technical time with a variety of tools to do analysis and modeling.
Maybe it’s a deeper understanding of statistics and related skills that lies at the center of my problem. Many people view these fields as black and white. “Show me the numbers”, people say. As if they are stone tablets chisled with the truth. Creating summaries, graphs, models, etc. requires understanding the domain, and subtle interactions. The tools are getting better, but we still need people to drive the tools and frame the questions right. To correct mistakes of causality.
In explaining this, the example that hits home for me is a dashboard for a product. Have you ever tried building a B2B software product without one? Good luck when sitting in front of an executive board and you can’t show them a dashboard they can monitor. Never mind that for all of your existing customers that dashboard is the least used page in your analytics. It’s key to the sale. But if you ignore that, and just look at user data to drive all of your decisions you’ll miss that.
So maybe there’s nothing wrong with being data-driven, it’s just that you have to be willing to mix in some decisions based on strategy and experience. And you have to ask your customers the right questions in the first place.
To be a great firm, a company should find a sophisticated middle ground. You can’t rely on a visionary employee to drive all decisions. Many founders think they are Steve Jobs and can divine all customer needs. Steve Jobs is an outlier among outliers. The answer, however, is not to turn in your brain because you started gathering data. The metrics are a tool, and you can still choose how to use your tools. A feature (or page) may still be legally required. Or it may be used rarely, but of tremendous value when it is. Data provides clarity for the many mundance decisions. It should still be up to a person to set the strategy. Otherwise, you’ll be selling a product without a dashboard. Heaven forbid…