If you know me well, you know that I can certainly appreciate a good glass of wine on occasion. Yes really, “in due time,” I don’t have a structural problem! Unfortunately, having a glass of wine now and then does not make you a connoisseur, making it sometimes difficult to choose a good wine. But I am a software developer, and as you know, software developers have a solution for everything… At least, customers expect that somehow. So I asked myself: as a software developer, how would you pick a good wine?
machine learning
New wines appear on the market every day and the number of wines, already sold, is (almost) innumerable. This makes it impossible to keep an up-to-date list of the quality of all wines. To determine the quality of all wines, we need a self-learning algorithm. That’s what’s called Machine Learning!
ml.net
There are many different ways to implement machine learning. An interesting project from Microsoft is ML.NET. This project was officially released for the first time in May 2019. Because the software is still quite young, I recommend showing a “beta” label if you want to use this for important features. More information on the status and capabilities of ML.NET can be found on the official website.
algorithm
Before we get into wine tasting programming, you should know that there are many algorithms for machine learning. These algorithms can be used in different scenarios and for different purposes. ML.NET supports several algorithms, so it is important to choose the right one for your project. Since I do not know all the algorithms and Microsoft will likely continue to add new ones, I describe the most commonly used ones in this article:
Classification: define a category for items based on one or more input variables. In the case of binary classification, there are only two categories: true (1) or false (0). This is used for decision models, such as: is this a good or bad wine. In addition, there is multi-class classification, supporting more than two groups, such as: which region does this wine come from?
Clustering : divide items into groups, based on their properties. The best known way of klustering, is the K-Means algorithm. Example: what is the price range of this wine?
Recommendation: to make recommendations based on a user’s previous choices. This could be interesting if you are going to sell wine. If you know what wine a customer has bought before, you can recommend other wines based on buying behavior of other customers.
Transfer learning: use someone else’s model. Recognizing objects on images with machine learning, requires a huge amount of training data and hours of GPU time to process it. It would be wasted effort to create such a library when others have already done so. Microsoft recommends using TensorFlow in such cases.
Regression: predict values based on one or more properties. To predict these values, a model is trained based on historical data. A typical scenario in which regression is used is to determine prices. Although it is also a good candidate for … predicting the quality of wine!
sample code
To demonstrate how the code in this article works, I have published a Github repository: https://github.com/vincentbitter/wine-ml
getting started with ml.net
It may surprise you, but setting up ML.NET only takes a few minutes! There is no need to install all kinds of services or SDKs. You just need your normal .NET development environment. I use Visual Studio 2017 Enterprise (15.9.12 to be exact) with .NET Core 2.2 for this purpose.
Since I don’t yet have an existing project, I first create a new .NET project. In this case, I choose the .NET Core 2.2 Console Application template, but basically any .NET Standard 2.0-based project can be used.
The second thing we need to do is add the ML.NET NuGet package. This one is called Microsoft.ML. Make sure you install version 1.3.1. This can be done via Manage NuGet Packages… in Visual Studio. Or with the following command:
collect data
The application should predict the quality of wine. To do that, a large set of data is needed so that the application can be trained in wine tasting. A great start, is Paulo Cortez’s dataset, which contains nearly 5,000 white varieties of the Portuguese “Vinho Verde.” Red wines we ignore for the moment, because I find it unlikely that you can create one formula to predict the quality of both white and red wines.
In the example project, I added the .csv file, with the Copy to Output Directory option. This allows us to use LoadFromTextFile from ML.NET. It is also possible to load in other sources, including a SQL database or URL. You then use LoadFromEnumerable.
To map the data, a simple class is needed with the appropriate properties. Floats are used as the type. It is also possible to map to other types, but that makes our life a lot harder, whereas in this article I want to show the simple basics of ML.NET.
In order to prove that our prediction is good enough, we will soon need to validate our trained model as well. It is common to use 70% of the available data to train the model and 30% to validate. The dataset was therefore split into two parts: winequality-white-train.csv and winequality-white-validate.csv.
train the model
We have the data available in our project, but we have not yet loaded it into ML.NET to train the model. The first step is to map the rows to a model. We have already read the CSV to an IDataView, but will need to tell our machine learning model how to interpret it. Initially, we will use all available fields except the Quality column, because that is the field we want to predict. Finally, we use the FastTree regression trainer to train the model.
validate the model
Cool, we have a model! But is it usable? We don’t know! Therefore, we need to validate the model. Without validation, it is impossible to indicate how reliable the model is. So this is really an important step. Microsoft helps us do this with the Evaluate method, which calculates various statistics about our model:
Be sure to use different data for validation than for training. This is the reason we previously split the dataset into two .csv files. The statistics include the R Squared and Root Mean Squared Error. The R Squared is between 0 and 1. The higher the better! The Root Mean Squared Error should be as close to 0 as possible.
PREDICT THE QUALITY OF WINE
We are ready! Our model is trained and we can get to work building a predictor for wine quality. The results of the prediction will be stored in a WinePrediction object:
After creating a PredictionEngine using the CreatePredictionEngine method, we can finally make a prediction:
As you can see from the sample code, the official quality of the wine is set at a 6. The result of our PredictionEngine…. is…. 5.69! That doesn’t make much of a difference! This was the first random row from the validation dataset I tested, so I didn’t cheat!
Note the [ColumnName] attribute in the model. This is set to “Score,” which should not be adjusted! The score is a fixed field used to write away the result of the prediction. Thus, if you change this, ML.NET no longer knows where to store the outcome of the prediction.
simplify the model
It is very impressive how well our model can predict wine quality. But back to our mission for a moment: to choose a good wine when we are in the store. How realistic is it that we are going to collect all the information requested in this model? Not exactly… Therefore, we should try to minimize the number of parameters. The easiest way to do that is to calculate the R Squared of each column separately and then use only the best field for the model. Of course, this is not the greatest solution, because it may well be that the combination of two columns yields much higher reliability than a single one, but we want to make our lives as simple as possible.
The conclusion of our research is: the amount of alcohol is the most important factor in the quality of wine! Easy to remember and easy to check because it’s on every label! Just grab the bottle with the most alcohol and you’re guaranteed a fun night!
Cheers!
This article first appeared on August 6, 2019 at blog.vincentbitter.co.uk after which it was translated by Vincent Bitter for Team Rockstars IT.