I tend to use Keras when doing deep learning, with tensorflow as the back-end. This allows use of tensorboard, a web interface that will chart loss and other metrics by training iteration, as well as visualize the computation graph. I noticed tensorboard has an area of the interface for showing different runs, but wasn’t able to see the different runs. Turns out I was using it incorrectly. I used the same directory for all runs, but to use it correctly you should use a subdirectory per run. Here’s how I set it up to work for me:
First, I needed a unique name for each run. I already had a function that I used for naming logs that captures the start time of the run when initialized. Here’s that code:
Then, I used that to create a constant for the tensorboard log directory:
Finally, I run tensorboard on the parent directory, without the unique run name:
If you’re wondering why I pass the host parameter to explicitly be all hosts, this is so that it works when running on a cloud GPU server.
You’ll now see each subdirectory as a unique run in the interface:
Multiple runs in tensorboard
That should do it. Comment if you have questions or feedback.
I had the chance to speak at StirTrek 2018 about Machine Learning in R. I have been to StirTrek before, but it’s been a few years. The conference has really grown, as there are over 2000 attendees now.
I was in the 3:30 timeslot. I talked in a full theater and they broadcast the talk to two other theaters. I don’t know what attendance was like in the overflow rooms. Most of the follow up questions were from developers looking for resources to get started, tutorials, etc. It seemed like a sign that attendees were interested in going further, which was the point of the talk.
Start of the Talk Agenda
The organizers did a great job. I had a helpful proctor who notified about time, and made sure I was setup and informed.
Regression as an intro to modeling
The talk will go up later this month on YouTube, and I’ll add it to the blog. Thanks to all who attended, and a big thanks to all who helped organize, sponsor, and volunteered for the conference.
I’ve been working through the following book on Bayesian methods with an emphasis on the pymc library:
However, pymc installation on OS X can be a bit of a pain. The issues comes down to fortran… I know. The version of gfortran in newer gcc implementations doesn’t work well with the pymc build, you need gfortran 4.2, as provided orignally by apple. Homebrew has a package for this.
I dealt with this before, but had problems again after upgrading to Sierra. So this time, I thought I’d document the steps so I don’t have this problem again. Let me know if there are any steps that you feel need added as you try this.
One of the challenges of data science in general is that it is a multi-disciplinary field. For any given problem, you may need skills in data extraction, data transformation, data cleaning, math, statistics, software engineering, data visualization, and the domain. And that list likely isn’t inclusive.
One of the first questions when it comes to machine learning in specific, is “how much math do I need to know?”
This is where I would recommend you start, to get the most value for your time:
Matrix Multiplication (Subject: Linear Algebra)
Probability (Subject: Statistics)
Normal Distributions (Subject: Statistics)
Bayes Theorem (Subject: Statistics)
Linear Regression (Subject: Statistics)
Of course you will run across other math needs, but I think the above list represents the foundation.
If you need places to get started with those topics, check out Kahn Academy, Coursera, or your location library.