Red Wine
I will give a small tutorial on machine learning. This is not a new concept - it stems from statistics (and math).. We have been doing statistical analysis for a long time. What changed is that, now we have lots and lots of data, we need to get results faster, complex application (e.g.., face nonrecognition) and machines are cheaper. This created the new field of Machine Learning.

Most of the problems falls into 3 kinds
Classification, Regression or Clustering. As you can see the figure on the left, classification is making binary decision (Yes/No or Good/Bad) - or multi-class (like eye color ={black, brown... }. FYI - your spouse cooking is not a classification problem. It is never "Bad" - don't use ML there or else you will be in trouble.
The next one is regression, like predicting house prices or interest rate. The last one is clustering, which is used to cluster into groups - this is used in lot of clinical research and other fields. Classification and Regression are call Supervised Learning which means we give hints, like giving supervision to your kids. Clustering is called Unsupervised Learning as we let system make decision - this is like when your kid goes to college - they are on their own - they form their own clusters :-)
Caution:
Don't use machine learning to answer your spouse. It was not a classification (blonde hair) .. it was a regression problem (0.0064). You will be always wrong! (Any flame :-))
Don't use machine learning to predict if your kid will get into Stanford? Rather spend time with the kid and it does not matter.
I think you are getting some idea now. But jokes aside, Machine Leaning (ML) can be used in solving a wide range of problems, from early detection of disease to self-driving cars and I expect many many application such as your refrigerator or your energy meters to use it. It will not make your food though! It could predict what kind of food you will likely to get for dinner today! But it comes with a probability and you know probability is not something which works in this case.
Now let's get back to the problem, I started with - "Understanding Red wine". To solve a problem, we need good data (that is true in all cases). In understand what makes good wine, i had to use publicly available data (and since most of the cases, no one wants to give away critical data, I am starting with very little good data). For red wine, I had the following information; fixed_acidity, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, pH, sulphates, alcohol and finally quality. Quality, a range of (0-10) with 10 being the best. I am skipping details here (e..g, description of each filed). Two key concepts here: quality is called the response, which is what we are trying to predict and fixed_acidity, .. are called features.
One more concept (last one). When you build a machine leaned model, you train a model (like training a dog) with 75-80% of data and validate the model with 25-20% pf data to see how good the model was (like making sure your dog understood).
Now, I have to eat my own "dog food". What does that mean? No, I did not eat "dog food". I used our platform which we have been building for last 6 month to test. I build three models with 3 different algorithms; Gradient Boost Machine (GBM), Random Forest and Neural networks - deep learning.
Now, let's discuss what we found. The following picture the relative importance of each feature. Looks like alcohol content, acidity and sulfates play an important role. This data give hints on what features are important for better prediction. This information can be used by the wine maker to do his "art"and predict if this will be a good vintage or not.
Finally, let's test it with an example.I entered some values (only a part of input is shown left). The result predicted was that the quality will be 4.776. That will not be a good wine to drink (may be good for cooking). I tried few more example and slowly .. I was able to get a wine with quality score of 8.7 (that's a wine to drink!).
If we had better data and more data, our understanding and prediction will get better. I only had 1600 sample data points from public domain.
I hope this give you some idea of machine learning. I have used what I enjoy (red wine) to my current passion and tried to explain the concepts.
These days high school kids are using machine learning to do interesting projects .. amazing! One of my friend's high school son will be interning this summer with me to do some fun projects in ML .. looking forward it - these days kids have brilliant ideas. If you to try something, let me know .. happy to make you beta tester in our platform!
Cheers!
To best of health and Life!





Comments
Post a Comment