Here is a recent interview I did for CLK Tech. CLK Tech is a newsletter based out of Northeast Ohio, run by a couple of tech recruiters in the area. Topics span general career questions and data science in particular.
In addition, I’m busy with a project that I look forward to announcing soon. It’s shaping up to be a a busy year…
I’ve been working through the following book on Bayesian methods with an emphasis on the pymc library:
However, pymc installation on OS X can be a bit of a pain. The issues comes down to fortran… I know. The version of gfortran in newer gcc implementations doesn’t work well with the pymc build, you need gfortran 4.2, as provided orignally by apple. Homebrew has a package for this.
I dealt with this before, but had problems again after upgrading to Sierra. So this time, I thought I’d document the steps so I don’t have this problem again. Let me know if there are any steps that you feel need added as you try this.
(hover for CC attribution)
One of the challenges of data science in general is that it is a multi-disciplinary field. For any given problem, you may need skills in data extraction, data transformation, data cleaning, math, statistics, software engineering, data visualization, and the domain. And that list likely isn’t inclusive.
One of the first questions when it comes to machine learning in specific, is “how much math do I need to know?”
This is where I would recommend you start, to get the most value for your time:
- Matrix Multiplication (Subject: Linear Algebra)
- Probability (Subject: Statistics)
- Normal Distributions (Subject: Statistics)
- Bayes Theorem (Subject: Statistics)
- Linear Regression (Subject: Statistics)
Of course you will run across other math needs, but I think the above list represents the foundation.
If you need places to get started with those topics, check out Kahn Academy, Coursera, or your location library.
For more on machine learning, check out other posts such as ML in R, Linear Algebra in R, and ML w/XGBoost.
I presented at the Cleveland R User Group on using xgboost in R.
Slides are available here.
Code (jupyter notebooks) are here.
Feedback welcome. Enjoy!
Google released TensorFlow as open source for community use and improvement. From the site: “TensorFlow™ is an open source software library for numerical computation using data flow graphs.”
The instructions on tensorflow.org are aimed at Ubuntu and OS X. I had a need to install it on CentOS so I documented the steps in a github gist. Feel free to comment if you find something I missed:
* Updated 8/18/2016 for TensorFlow 0.10
* Updated gist 10/18/2016 to correct typo in epel-release